Aviva Prins, University of Maryland, College Park
Human-in-the-Loop Resource Allocation in Restless Multi-Armed Bandits and Their Application to Public Health Interventions
We investigate Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a partially-observable binary-state Markovian process. We demonstrate that state-of-the-art algorithms that keep as many arms in the 'good' state as possible allocate a limited budget of actions per round to a small subset of arms, while neglecting others. Our main contribution is a human-in-the-loop model. This model guarantees the enforcement of safety constraints on agent behavior and encourages explainable policies. We compare simulations to real-world data where a health worker must monitor and deliver interventions to maximize their patients' adherence to a tuberculosis prescription plan.
Milind Tambe & Rediet Abebe