Melanie Pradier: Bayesian Nonparametric Models for Data Exploration


Monday, March 5, 2018, 11:30am to 1:00pm


Maxwell Dworkin 119

Making sense out of data is one of the biggest challenges of our time.  As more data is gathered and ML systems become ubiquitous, our society can benefit from better predictions and enhanced data-driven decision systems. Yet, understanding data remains challenging in many application domains such as personalized medicine. Most relevant questions, e.g., what the underlying mechanisms of cancer are, cannot be stated as well-defined supervised problems, and might benefit enormously from interpretable models and rigorous data exploratory analyses involving multidisciplinary research.

This talk focuses on Bayesian nonparametric (BNP) models, flexible generative approaches that allow for a potentially infinite number of parameters. BNP methods constitute flexible tools for data exploration: they can provide us with easy-to-interpret insights of the data via latent variables while adjusting their complexity to the amount of observations, therefore avoiding expensive model selection steps.

In today’s talk, we will first discuss how Bayesian nonparametric (BNP) models can be used to find hidden patterns in data. BNPs return latent variable models that, following De Finetti’s Theorem, find latent variables for which the observed data are conditionally independent. From the analysis of these latent variable models, we can gain knowledge about the system that generated the data and draw actionable conclusions and propose tests to verify the found connections.

Second, we will illustrate the usefulness of BNP models in different use-case scenarios within sport sciences, cancer research and economics. The developed latent variable models yield not only intuitive, perhaps unsurprising findings that align with expert knowledge, but also relevant information and novel data-driven hypotheses that might not be evident to the naked eye.

Melanie F. Pradier is currently a CRCS Postdoctoral Fellow co-sponsored by the Data Science Initiative at Harvard, working on interpretable machine learning, probabilistic models and bio-medical applications. She received her Ph.D. at Universidad Carlos III in Madrid, funded by a Marie Curie ITN Fellowship from the European Union. In 2014-2015, she was a 1-year research visitor at the Memorial Sloan Kettering Cancer Center in New York, and performed a 3-month internship at the Innovation Technology Center of Roche Diagnostics. Melanie studied Telecommunication Engineering at the Technical University of Madrid (UPM), and obtained her MSc in Information Technology at the University of Stuttgart in 2011. Just before her PhD, she spent two years working in the industry at Sony Research Center in Stuttgart and Sony Corporation R&D in Tokyo. As a side-project, Melanie is also a User Advocacy advisor for the tech start-up. Her current research interests include probabilistic graphical models, approximate inference techniques (MCMC and variational techniques), dependent random measures, clustering and topic modeling, biomedical applications, and information theory. A detailed CV and list of publications or patents can be found at