Rising Stars Speaker Series

Date: 

Tuesday, April 20, 2021, 12:00pm to 1:30pm

Repeats every 3 weeks every Tuesday, 1 times Also includes Thu Apr 08 2021, Tue Apr 20 2021, Thu Apr 29 2021

Location: 

Virtual (See registration links below)

This year we are excited to host a virtual CRCS Rising Stars Speaker Series! Each talk in the series will feature 12 minute presentations from 4 PhD students and postdoctoral candidates who were nominated by experts in the field as having demonstrated exemplary research in topics related to AI for social good. Talks will be followed by a panel discussion with the speakers. Please find more information below, and send any questions via email to crcs_workshop@g.harvard.edu.

Organizers: Jackson Killian, Aditya Mate, Lily Xu, and Milind Tambe

We are excited to announce our final list of nominated CRCS Rising Stars in AI! Please find their names and bios underneath the sessions during which they will give presentations.

All events will take place 12–1:30pm ET

View videos from the 2021 Rising Stars series here.

 

Public Health: Tuesday, March 30

Registration link: https://harvard.zoom.us/meeting/register/tJ0qdemqrz0jHder11L3ESP6qBZIEIYpjb7x

Sonali Parbhoo (Harvard University)

LinkedIn: sonaliparbhoo

Image of Sonali Parbhoo

Talk Title:  Robust Machine Learning Methods for Targeted Healthcare

Abstract:  Across several fields in science and engineering, we are increasingly turning to machine learning solutions for making decisions that can affect our lives in profound ways.
Unlike many of these success stories, machine learning has had limited success in healthcare. Yet the vast volumes of medical data currently recorded are far beyond what medical experts can analyse. In this talk, I will discuss the importance of building robust tools that can communicate their decisions and limitations to human decision-makers. I will demonstrate how building small, inspectable models that humans can understand can help us manage hypotension in the ICU, and show how incorporating human input into off-policy evaluation can help us find better strategies for managing illnesses such as HIV. Throughout the talk I will highlight several interesting questions that could have a profound impact on healthcare.

Bio: Sonali is a postdoctoral research fellow at Harvard, working with Prof Finale Doshi-Velez. Her research focuses on decision-making in uncertainty, causal inference and building interpretable models to improve clinical care and deepen our understanding of human health, with applications in areas such as HIV and critical care. Her work has been published at a number of machine learning conferences (NeurIPS, AAAI, ICML, AISTATS) and medical journals (Nature Medicine, Nature Communications, AMIA, PLoS One, JAIDS). Sonali received her PhD (summa cum laude) in July 2019 from the University of Basel, Switzerland, where she built intelligent models for understanding the interplay between host and virus in the fight against HIV. She was also a recipient of the Swiss National Science Foundation (SNSF) Mobility Fellowship for her research at Harvard. Prior to this, Sonali received her B.Sc. and M.Sc. in Johannesburg, South Africa where she specialised in Molecular Biology, Computer Science and Mathematics. Apart from her research, Sonali is also passionate about encouraging more discussion about the role of ethics in developing machine learning technologies to improve society.

 

Paidamoyo Chapfuwa (Duke University)

@chapfuwa

paidamoyo_chapfuwa.jpg

Talk Title:  Counterfactual Survival Analysis with Balanced Representations

Abstract:  Survival analysis or time-to-event studies focus on modeling the time of a future event, such as death or failure, and investigate its relationship with covariates or predictors of interest. Specifically, we may be interested in the causal effect of a given intervention or treatment on survival time. A typical question may be: will a given therapy increase the chances of survival of an individual or population? Such causal inquiries on survival outcomes are common in the fields of epidemiology and medicine. In this talk, I will introduce our recently proposed coun- terfactual inference framework for survival analysis which adjusts for bias from two sources, namely, confounding (from covariates influencing both the treatment assignment and the outcome) and censoring (informative or non- informative). I will then present extensive results on challenging datasets, such as the Framingham Heart Study and the AIDS clinical trials group (ACTG).

Bio: Paidamoyo Chapfuwa received B.S.E. with distinction, M.S., and Ph.D. degrees in electrical and computer engineering from Duke University, Durham, NC, USA, in 2013, 2018, and 2021 (expected), respectively. Paidamoyo has been advised throughout her Ph.D. by Drs. Lawrence Carin and Ricardo Henao. Her research focuses on developing modern machine learning approaches, i.e., representation and deep learning, to characterize individualized survival (event times) from clinical data such as electronic health records and more recently, immunomics. Her work incorporates statistical techniques from causal inference, generative modeling, and Bayesian nonparametrics. Her work has culminated in publications at prestigious venues such as IEEE, ACM, ACL, and ICML. See https://paidamoyo.github.io for more information.

 

Irene Chen (MIT)

@irenetrampoline

irene_chen.jpg

Talk Title:  Beyond Bias Audits: Building an Ethical Machine Learning for Health Pipeline 

Abstract:  Machine learning has demonstrated the potential to fundamentally improve healthcare because of its ability to find latent patterns in large observational datasets and scale insights rapidly. However, the use of ML in healthcare also raises numerous ethical concerns, often analyzed through bias audits. How can we address algorithmic inequities once bias has been detected? In this talk, we consider the pipeline for ethical machine learning in health and focus on two case studies. First, cost-based metrics of discrimination in supervised learning can decompose into bias, variance, and noise terms with actionable steps for estimating and reducing each term. Second, deep generative models can address left-censorship from unequal access to care in disease phenotyping. The talk will conclude with a discussion of directions for further research along the entire model development pipeline including problem selection and data collection.

Bio: I’m a Ph.D. student in computer science at MIT, advised by David Sontag in the Clinical Machine Learning group. I work on machine learning methods to advance understanding of health and reduce inequality. Prior to MIT, I completed a joint AB/SM degree at Harvard. I also worked at Dropbox as a data scientist, machine learning engineer, and chief of staff.

 

Charles C Onu (MILA)

@onucharlesc

charles_onu.jpg

Talk Title:  Robust algorithms for the analysis of infant cry sounds to detect pathologies

Abstract:  My research is inspired by the goal of developing accurate and robust algorithms for the analysis of infant cry sounds to detect pathologies in the real world. I will discuss our work in learning in the small data setting, model compression and task-invariant representations of cry sounds. I will also describe our ongoing effort, in collaboration with clinicians across 3 countries to collect a large database of newborn cry sounds that are fully-annotated with clinical indications. Such a database will facilitate the development and validation of effective models for pathology detection.

Bio: I conduct my research at the intersection of artificial intelligence and healthcare at Mila and the Reasoning and Learning (RL) lab, McGill University. My supervisor is Prof. Doina Precup, co-director of RL lab and director of the DeepMind lab in Montreal. The overarching theme guiding my work is advancing machine learning to positively impact healthcare. Specific areas I work on include classical ML, deep learning, speech, physiological signal processing and tensor decomposition techniques. I hold a Vanier Canada Graduate Scholarship.

I founded and lead AI Research at Ubenwa. The Ubenwa project is aimed at developing cry-based, low-cost tools for early diagnosis of conditions that affect the central and autonomic nervous systems in newborns. Our work is funded by generous grants from Mila, Ministère de l’Économie et d’Innovation (MEI) du Québec, District 3 Innovation Centre, and MIT Solve.



Conservation: Thursday, April 8

Registration link: https://harvard.zoom.us/meeting/register/tJwlcuGhqDMoGtUQNV_zqRFQhmVg0fHMY8H_

Esther Rolf (UC Berkeley)

Website

esther_rolf.jpg

Talk Title: A Generalizable and Accessible Approach to Machine Learning with Global Satellite Imagery

Abstract: Combining satellite imagery with machine learning (SIML) has the potential to address global challenges by remotely estimating socioeconomic and environmental conditions in data-poor regions, yet the resource requirements of SIML limit its accessibility and use. We show that a single encoding of satellite imagery can generalize across diverse prediction tasks (e.g. forest cover, house price, road length). Our method achieves accuracy competitive with deep neural networks at orders of magnitude lower computational cost, scales globally, delivers label super-resolution predictions, and facilitates characterizations of uncertainty. Since image encodings are shared across tasks, they can be centrally computed and distributed to unlimited researchers, who need only fit a linear regression to their own ground truth data in order to achieve state-of-the-art SIML performance.

Bio: Esther Rolf is a 5th year PhD candidate in the Computer Science department at UC Berkeley, where she is advised by Mike Jordan and Ben Recht. Esther studies how data acquisition processes and downstream use cases influence the efficacy and applicability of machine learning systems, with emphasis on problems with the potential for positive social impact. Her projects span developing algorithms and infrastructure for reliable environmental monitoring using machine learning and understanding social outcomes of decisions influenced by machine learning systems.

Esther is also a member of the Berkeley AI Research (BAIR) Lab and is a fellow in the Global Policy Lab in the Goldman School of Public Policy at UC Berkeley. During her PhD she has graciously received support from an NSF GRFP grant and a Google PhD fellowship.

 

Elizabeth Bondi (Harvard University)

@BondiElizabeth

elizabeth_bondi.jpg

Talk title: Imagery and Strategic Reasoning: Making Decisions in Conservation with Imperfect Data

Abstract: In conservation, it is often the case that we have "imperfect" data: noisy, limited, difficult to collect or label, etc. Yet, we may be using these data to inform important decisions, for example, in deploying limited resources to protect animals from illegal poaching. It is therefore imperative to consider these "imperfect" characteristics throughout the process of collecting data, designing algorithms to interpret data and make decisions, and deploying such algorithms, not only during data collection and interpretation. We illustrate this point with conservation drones, including the noisy, real-time data they provide and the decisions we need to make to collect and respond to these data.

Bio: Elizabeth Bondi is a PhD candidate studying Computer Science at Harvard University, with an M.S. in Computer Science from the University of Southern California (USC) and a B.S. in Imaging Science from Rochester Institute of Technology (RIT). At Harvard, she is advised by Prof. Milind Tambe. Her research interests include computer vision and deep learning, remote sensing, and multi-agent systems, especially applied to conservation and sustainability. 

She has received a Best Application Demo Award at AAMAS 2019 and a Best Paper Award at the Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping Conference at SPIE DCS 2016, in addition to an Honorable Mention for the NSF Graduate Research Fellowship Program in 2017, and a Barry Goldwater Scholarship in 2015.

 

 

Sasha Luccioni (MILA)

@SashaMTL

sasha_luccioni.jpg

Talk title: Visualizing the Future Impacts of Climate Change with GANs

Abstract: Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both policy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. My team has developed a generative model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In my presentation, I will describe the details of the ClimateGAN framework, the components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic climate effects robustly. I will also present "This Climate Does Not Exist", the interactive user experience that will allow people to explore the potential impacts of climate change while learning about its impacts on our planet and society.

Bio: Sasha Luccioni is a postdoctoral researcher working on Artificial Intelligence for Humanity initiatives at Mila Institute, where she leads projects at the nexus of machine learning and social issues such as climate change, education and healthcare. Sasha got her PhD in Cognitive Computing from UQAM in 2018 and later spent two years working in applied machine learning research. Since joining Mila in early 2019, she has organized and led many AI for social good initiatives, conferences and workshops. She is also highly involved in her community, volunteering for initiatives such as Women in Machine Learning, Climate Change AI and Kids Code Jeunesse.

 

Shiva R Iyer (New York University)

@shivariyer

shiva_iyer.jpg

Talk title: Experiences from fine-grained air quality monitoring in Delhi using low-cost sensors

Abstract: Delhi is one of the most polluted cities in India and the world, and despite the installation of over 30 high-end air quality monitors throughout the city by various public bodies, we do not have sufficiently fine-grained information on local pollution levels in order to reason about air quality in specific localities. This public network, while quantifying air quality in an extremely detailed manner in terms of measuring a large number of components in the air, only provides a macro-level picture of air quality in the city. On the other hand, tiny localized pollution hotspots within a city, each created from localized sources (such as open waste burning and vehicle exhaust), add up over space and time to result in the poor air quality that is observed at the macro scale. We partnered with a company that produces air quality filters and monitoring equipment, produced 28 custom-designed low-cost monitors and placed them in various locations in the city, with a heavy concentration in South Delhi. We also formalize the notion of a "hotspot", define various types of hotspots and apply the model to our data. We find that the monitoring network augmented with our low-cost sensors can significantly enhance our understanding of localized levels of air pollution to which citizens are subject on a daily basis. For instance, we uncover locations with better/poor air quality that are not reported in the official government reports. And finally, we design statistical and machine learning models that learn a spatiotemporal "field", which can be used in interpolation and forecasting. Our message-passing neural network model, combined with a state-of-the-art spatiotemporal hierarchical model and a spline correction step, is able to predict PM2.5 values with a mean absolute percentage error of only 10% across all our locations.

Bio: Shiva is a sixth year PhD student in the CS Department, part of the Courant Institute of Mathematical Sciences, at New York University, advised by Prof Lakshminarayanan Subramanian. His research interests in computer science lie in the areas of both networked mobile systems and data science. On the mobile systems front, he has explored methods for improving transport-layer performance in next-generation wireless and mobile communication technologies such as millimeter wave (mmWave). On the data science front, he focuses on algorithms for spatiotemporal forecasting and predictive analytics in urban sensing applications. In the end, he wishes to bring these two disciplines together to design smart systems for urban spatiotemporal sensing applications. His larger goals beyond PhD are to have such systems deployed in the real world for measurable policy impact. He is also a member of the Open Networks and Big Data Lab, NYU Systems and NYU WIRELESS, and was formerly a member of EPoD at Harvard. He is a recipient of the GSAS Dean's Dissertation Fellowship at NYU, awarded to selected PhD students in their final year of dissertation writing, the Nokia Bell Labs Innovation Project Award for his work during his time as an intern at Nokia Bell Labs in the summer of 2019, and the Henning-Biermann Prize, awarded to students for outstanding service in the CS department at NYU.

 


Fairness: Tuesday, April 20

Registration link: https://harvard.zoom.us/meeting/register/tJYofumpqTkiGNf6it8O9IHL3J0HEcrazV8E

Lily Hu (Harvard University)

@uhlily

lily_hu.jpg

Talk Title:  Fair Classification and Social Welfare

Abstract:  In this talk, I will connect work in fair classification with some concepts in welfare economics. I take the connection to be rather natural: Now that machine learning algorithms lie at the center of many important resource allocation pipelines, computer scientists have been unwittingly cast as partial social planners. Given this state of affairs, important questions follow. How do leading notions of fairness as defined by computer scientists map onto longer-standing notions of social welfare? Our main findings on the relationship between fairness criteria and welfare center on sensitivity analyses of fairness-constrained empirical risk minimization programs. Most notably, we find that always preferring "more fair" classifiers is in tension with a commitment to the Pareto Principle—a fundamental axiom of social choice theory and welfare economics, which states that we should always prefer allocations that make everyone weakly better-off. Recent work in machine learning has rallied around these notions of fairness as critical to ensuring that algorithmic systems do not have disparate negative impact on disadvantaged social groups. By showing that these constraints often fail to translate into improved outcomes for these groups, we cast doubt on their effectiveness as a means to ensure justice.

Bio: Lily Hu is a PhD candidate in Applied Mathematics and Philosophy at Harvard University. She works in philosophy of (social) science and political and social philosophy. Her dissertation project concerns causal theorizing about the social world, in particular reasoning about the “causal effect” of social categories such as race and sex, and the relationship between this kind of causal theorizing and normative theorizing about core ethical notions such as discrimination and fairness. She has also worked on topics in machine learning theory and algorithmic fairness.

 

Angela Zhou (Cornell University)

@angelamczhou

angela_zhou.jpg

Talk Title:  Fairness, Welfare and Equity in Personalized Pricing

Abstract:  We study the interplay of fairness, welfare, and equity considerations in personalized pricing based on customer features. Sellers are increasingly able to conduct price personalization based on predictive modeling of demand conditional on covariates: setting customized interest rates, targeted discounts of consumer goods, and personalized subsidies of scarce resources with positive externalities like vaccines and bed nets. These different application areas may lead to different concerns around fairness, welfare, and equity on different objectives: price burdens on consumers, price envy, firm revenue, access to a good, equal access, and distributional consequences when the good in question further impacts downstream outcomes of interest. We conduct a comprehensive literature review in order to disentangle these different normative considerations and propose a taxonomy of different objectives with mathematical definitions. We focus on observational metrics that do not assume access to an underlying valuation distribution which is either unobserved due to binary feedback or ill-defined due to overriding behavioral concerns regarding interpreting revealed preferences. In the setting of personalized pricing for the provision of goods with positive benefits, we discuss how price optimization may provide unambiguous benefit by achieving a "triple bottom line": personalized pricing enables expanding access, which in turn may lead to gains in welfare due to heterogeneous utility, and improve revenue or budget utilization. We empirically demonstrate the potential benefits of personalized pricing in two settings: pricing subsidies for an elective vaccine, and the effects of personalized interest rates on downstream outcomes in microcredit.

Bio: Angela Zhou is a fifth-year PhD candidate at Cornell University/Cornell Tech in Operations Research and Information Engineering. She works at the intersection of statistical machine learning and operations research in order to inform reliable data-driven decision-making in view of fundamental practical challenges that arise from realistic information environments. In particular, her research has focused on robust causal inference for decision-making, and credible performance evaluation for algorithmic fairness and disparity assessment.

 

Ana-Andrea Stoica (Columbia University)

@astoica73

anaandreeastoica-headshot.jpg

Talk Title:  Diversity and inequality in social networks

Abstract:  Online social networks often mirror inequality in real-world networks, from historical prejudice, economic or social factors. Such disparities are often picked up and amplified by algorithms that leverage social data for the purpose of providing recommendations, diffusing information, or forming groups. In this talk, I discuss an overview of my research, involving explanations for algorithmic bias in social networks, briefly describing my work in recommendation algorithms and information diffusion. Using network models that reproduce inequality seen in online networks, we'll characterize the relationship between pre-existing bias and algorithms in creating inequality, discussing different algorithmic solutions for mitigating bias.

Bio: Ana-Andreea Stoica is a Ph.D. candidate at Columbia University. Her work focuses on mathematical models, data analysis, and inequality in social networks. From recommendation algorithms to the way information spreads in networks, Ana is particularly interested in studying the effect of algorithms on people's sense of privacy, community, and access to information and opportunities. She strives to integrate tools from mathematical models—from graph theory to opinion dynamics—with sociology to gain a deeper understanding of the ethics and implications of technology in our everyday lives. Ana grew up in Bucharest, Romania, and moved to the US for college, where she graduated from Princeton in 2016 with a bachelor's degree in Mathematics. Since 2019, she has been co-organizing the Mechanism Design for Social Good initiative.

 

Paul Gölz (CMU)

@paulgoelz

paul_golz.jpg

Talk Title:  Fair Algorithms for Selecting Citizens' Assemblies

Abstract:  Globally, there has been a recent surge in citizens’ assemblies, a form of civic participation in which a panel of randomly-selected citizens weighs in on policy questions. Citizens would ideally be selected to serve on this panel with equal probability. In practice, however, this is impossible due to demographic quotas, which are imposed to ensure that panels are representative despite unequal participation rates across subpopulations. The selection algorithms currently used for choosing panels have received little attention, and we find that the one we examine as a benchmark tends to give pool members highly unequal selection probabilities. Here, we develop selection algorithms that satisfy quotas while choosing pool members with probabilities as close to equal as mathematically possible, for a range of fairness metrics that quantify “closeness to equality”. We have also implemented one such algorithm, which has been adopted by a number of organizations around the world. We demonstrate on multiple real-world datasets that our algorithm is substantially fairer than the benchmark.By contributing a fairer, more principled, and deployable algorithm, our work puts the practice of sortition on firmer foundations. Our algorithm is also one of the first applications of ideas from the field of fair division in practice.
Joint work with Bailey Flanigan, Anupam Gupta, Brett Hennig, and Ariel Procaccia

Bio: Paul Gölz is a PhD student in the Computer Science Department at CMU and is advised by Ariel Procaccia (now at Harvard). Paul’s research applies tools from AI, algorithms, and game theory to help society make better decisions. A specific interest of his are emerging forms of democratic participation and how these processes can be supported by axiomatic and algorithmic analysis.


Tech + Society: Thursday, April 29

Registration link: https://harvard.zoom.us/meeting/register/tJcsfuqsqj0jH9dhDTdADiB_cXC4XlZS5ILv

Amber M. Hamilton (University of Minnesota)

@ayeemach

amber_hamilton.jpg

Talk Title:  Platforming Race: Examining racial ideologies on social media platforms

Abstract:  Though social media platforms such as Facebook and Twitter draw billions of daily users, it has been difficult to explore the relationship between the services that platforms provide and their corporate beliefs about race and racism. Indeed, rather than talking about race and racism, these companies prefer to gesture to vague ideals of diversity and inclusion instead. However, the killing of George Floyd by Minneapolis Police on May 25, 2020 led to an outpouring of racial justice solidarity statements, policy changes, and racial justice donations from social media companies. Though it's unclear why these companies felt compelled to respond to this moment of racial upheaval, the cascade of statements provide a unique opportunity to explore the beliefs about race as expressed by social media companies — beliefs that are enacted on their platforms — and provide important context for the bias within technological systems that we've seen in recent years. In this talk, I describe my analysis of these public statements and how these companies can work to design more equitable and fair AI systems.

Bio: Amber M. Hamilton is a doctoral candidate in Sociology at the University of Minnesota, Twin Cities. Her research focuses on the intersection of race, racism, and technology. Her dissertation, titled "Doing Race Online: An Exploration of Race-Making on Social Media Platforms," explores the meaning-making around race and racism that occurs on digital platforms. Amber has worked as a PhD Research Intern at IBM Research, Microsoft Research, and the Berkman Klein Center for Internet and Society.

 

Brooklyne Gipson (University of Illinois Urbana-Champaign)

@Brooklyne

brooklyne gipson

Talk Title: TBA

Abstract: TBA

Bio: Brooklyne Gipson is an Illinois ACLS/DRIVE Distinguished Postdoctoral Fellow in the Digital Humanities at the University of Illinois, Urbana-Champaign. Dr. Gipson is an interdisciplinary communication scholar whose areas of research include digital and social media environments, Black feminist digital/technology studies, and the intersection of race, gender, social media, and power. Her work examines how social media platforms facilitate civic engagement within Black communities. Her current research takes an intersectional approach to analyzing how anti-Black discourses manifest themselves in everyday discursive exchanges within Black social media spaces.

 

Randi Williams (MIT)

@randi_c1

randi_williams.jpg

Talk Title:  How to Train Your Robot: Envisioning the future of education

Abstract:  Today Artificial Intelligence (AI) permeates almost every area of society, impacting how many of us learn, seek employment, and relate to one another. As such, we must consider how the automation revolution will impact individuals’ abilities to participate in the future depending on their level of technological literacy. AI Education, particularly in primary through secondary school, presents an opportunity to prepare a diverse citizenry to flourish in an AI-powered society. In the Personal Robots Group at the MIT Media Lab, I create curricula and tools to teach primary and middle school students about AI. Through lessons that introduce AI systems as fundamentally sociotechnical, students develop the skills to imagine and implement personally meaningful AI projects. Along the way, they also build 21st century skills including digital literacy, critical thinking, and applying knowledge. In this talk, I will share the motivations, design principles, and outcomes of K-12 AI education projects I have been involved with.

Bio: Randi Williams is a 3rd year Ph.D. student in the Personal Robots Group at the MIT Media Lab. She received her Master of Science in Media Arts in Sciences from MIT in 2018 and her Bachelor of Science in Computer Engineering from UMBC in 2016. Her research intersects human-robot interaction and education with a particular focus on engaging students from underrepresented communities in tech. In her current project, How to Train Your Robot, she is working with educators to design AI curricula that teach AI concepts through hands-on projects related to current issues. You can learn more about her past and ongoing projects at https://www.media.mit.edu/people/randiw12/overview/.

 

Dora Demszky (Stanford University)

@ddemszky

dorattya_demszky.jpg

Talk Title:  Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks

Abstract:  Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically marginalized groups. We find that Latinx people are rarely discussed, and the most common famous figures are nearly all White men. Lexicon-based approaches show that Black people are described as performing actions associated with low agency and power. Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our computational toolkit to support new research directions.

Bio: Dora is a 4th year PhD student in Linguistics at Stanford, advised by Dan Jurafsky. Her research focuses on developing natural language processing methods to support student-centered education. Her recent publications focus on analyzing the representation of historically marginalized groups in US history textbooks and on measuring teachers' uptake of student ideas in classroom discourse. She is currently leading a project studying the effectiveness of providing linguistic feedback to teachers using NLP.

 

Click here to revisit last year's 2020 Spring Rising Stars Workshop here.