Trade-Offs between Fairness and Interpretability in Machine Learning

Citation:

Agarwal S. Trade-Offs between Fairness and Interpretability in Machine Learning, in IJCAI 2021 Workshop on AI for Social Good. ; 2021.

Abstract:

In this work, we look at cases where we want a classifier to be both fair and interpretable, and find that it is necessary to make trade-offs between these two properties. We have theoretical results to demonstrate this tension between the two requirements. More specifically, we consider a formal framework to build simple classifiers as a means to attain interpretability, and show that simple classifiers are strictly improvable, in the sense that every simple classifier can be replaced by a more complex classifier that strictly improves both fairness and accuracy.