We propose a multi-armed bandit setting where each arm corresponds to a subpopulation, and pulling an arm is equivalent to granting an opportunity to this subpopulation. In this setting the decision-maker’s fairness policy governs the number of opportunities each subpopulation should receive, which typically depends on the (unknown) reward from granting an opportunity to this subpopulation. The decision-maker can decide whether to provide these opportunities or pay a predefined monetary value for every withheld opportunity. The decision-maker’s objective is to maximize her utility, which is the sum of rewards minus the cost of withheld opportunities. We provide a no-regret algorithm that maximizes the decisionmaker’s utility and complement our analysis with an almost-tight lower bound. Full version of the paper is available at https://tinyurl.com/y7s9avud.
Back to AI for Social Good event