Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning, in Uncertainty in Artificial Intelligence (UAI 2022). ; Forthcoming. arXiv.
Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems, in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). ; 2022. arXiv.
Q-Learning Lagrange Policies for Multi-Action Restless Bandits, in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021). ; 2021. arXiv.
Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare, in Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021). ; 2021 :4036-4049. Publisher's Version.
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems, in NeurIPS (Spotlight). Vancouver, Canada ; 2020.Abstract.
Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solvingthe problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved per-formance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.