India accounts for 11% of maternal deaths globally where a woman dies in childbirth every fifteen minutes. Lack of access to preventive care information is a significant problem contributing to high maternal morbidity and mortality numbers, especially in low-income households. We work with ARMMAN, a non-profit based in India, to further the use of call-based information programs by earlyon identifying women who might not engage on these programs that are proven to affect health parameters positively. We analyzed anonymized callrecords of over 300,000 women registered in an awareness program created by ARMMAN that uses cellphone calls to regularly disseminate health related information. We built robust deep learning based models to predict short term and long term dropout risk from call logs and beneficiaries’ demographic information. Our model performs 13% better than competitive baselines for short-term forecasting and 7% better for long term forecasting. We also discuss the applicability of this method in the real world through a pilot validation that uses our method to perform targeted interventions.
Much of the computational social science research
focusing on issues faced in developing nations concentrates on web content written in a world language often ignoring a significant chunk of a corpus
written in a poorly resourced yet highly prevalent
first language of the region in concern. Such omissions are common and convenient due to the sheer
mismatch between linguistic resources offered in
a world language and its low-resource counterpart.
However, the path to analyze English content generated in linguistically diverse regions, such as the
Indian subcontinent, is not straight-forward either.
Social science/AI for social good research focusing on Indian sub-continental issues faces two major Natural Language Processing (NLP) challenges:
(1) how to extract a (reasonably clean) monolingual
English corpus? (2) How to extend resources and
analyses to its low-resource counterpart? In this
, we share NLP methods, lessons learnt from
our multiple projects, and outline future focus areas that could be useful in tackling these two challenges. The discussed results are critical to two
important domains: (1) detecting peace-seeking,
hostility-diffusing hope speech in the context of the
2019 India-Pakistan conflict (2) detecting user generated web-content encouraging COVID-19 health
Gerrymandering is the process of drawing electoral district maps in order to manipulate the outcomes of elections. Increasingly, computers are involved in both drawing biased districts and attempts to measure and regulate this practice. The most highprofile proposals to measure partisan gerrymandering use past voting data to classify a map as gerrymandered (or not). Prior work studies the ability of these metrics to detect gerrymandering, but does not explore how the metrics could affect voter behavior or be circumvented via strategic voting. We show that using past voting data for this classification can affect strategyproofness by introducing a game which models the iterative sequence of voting and redrawing districts under regulation that bans outlier maps. In experiments, we show that a heuristic can find strategies for this game including on real North Carolin maps and voting data. Finally, we address questions from a recent US Supreme Court case that relate to our model. This is a summary of “Meddling Metrics: the Effects of Measuring and Constraining Partisan Gerrymandering on Voter Incentives” appearing in EC2020
Rapid damage assessment after natural disasters is crucial for effective planning of relief efforts. Satellites with Very High Resolution (VHR) sensors can provide a detailed aerial image of the affected area, but current damage detection systems are fully- or semi-manual which can delay the delivery of emergency care. In this paper, we apply recent advancements in segmentation and change detection to detect damage given pre- and post-disaster VHR images of an affected area. Moreover, we demonstrate that segmentation models trained for this task rely on shadows by showing that (i) shadows influence false positive detections by the model, and (ii) removing shadows leads to poorer performance. Through this analysis, we aim to inspire future work to improve damage detection.
Social media has become an increasingly important political domain in recent years, especially for campaign advertising. In this work, we develop a linear model of advertising influence maximization in two-candidate elections from the viewpoint of a fully-informed social network platform, using several variations on classical DeGroot dynamics to model different features of electoral opinion formation. We consider two types of candidate objectives—margin of victory (maximizing total votes earned) and probability of victory (maximizing probability of earning the majority)—and show key theoretical differences in the corresponding games, including advertising strategies for arbitrarily large networks and the existence of pure Nash equilibria. Finally, we contribute efficient algorithms for computing mixed equilibria in the margin of victory case as well as influence-maximizing best-response algorithms in both cases and show that in practice, as implemented on the Adolescent Health Dataset, they contribute to campaign equality by minimizing the advantage of the higherspending candidate.
Since the start of the pandemic, the proliferation of fake news and misinformation has been a constant battle for health officials and policy makers as they work to curb the spread of COVID-19. In areas within the Global South, it can be difficult for officials to keep track of the growth of such false information and even harder to address the real concerns their communities have. In this paper, we present some techniques the AI community can offer to help address this issue. While the topics presented within this paper are not a complete solution, we believe they could complement the work government officials, healthcare workers, and NGOs are currently doing on the ground in Sub-Saharan Africa.
As the COVID-19 pandemic continues, formulating targeted policy interventions supported by differential SARS-CoV2 transmission dynamics will be of vital importance to national and regional governments. We develop an individual-level model for SARS-CoV2 transmission that accounts for location-dependent distributions of age, household structure, and comorbidities. We use these distributions together with age-stratified contact matrices to instantiate specific models for Hubei, China; Lombardy, Italy; and New York, United States. We then develop a Bayesian inference framework which leverages data on reported deaths to obtain a posterior distribution over unknown parameters and infer differences in the progression of the epidemic in the three locations. These findings highlight the role of between-population variation in formulating policy interventions.
Applications of artificial intelligence for wildlife protection have focused on learning models of poacher behavior based on historical patterns. However, poachers’ behaviors are described not only by their historical preferences, but also their reaction to ranger patrols. Past work applying machine learning and game theory to combat poaching have hypothesized that ranger patrols deter poachers, but have been unable to find evidence to identify how or even if deterrence occurs. Here for the first time, we demonstrate a measurable deterrence effect on real-world poaching data. We show that increased patrols in one region deter poaching in the next timestep, but poachers then move to neighboring regions. Our findings offer guidance on how adversaries should be modeled in realistic gametheoretic settings.
An ongoing challenge in machine learning is to improve the transparency of learning models, helping end users to build trust and defend fairness and equality while protecting individual privacy and information assets. Transparency is a timely topic given the increasing application of machine learning techniques in the real world, and yet much more progress is needed in addressing the transparency issues. We propose critical research questions on transparency-aware machine learning on two fronts: know how and know that. Know-how is concerned with searching for a set of decision objects (e.g. functions, rules, lists, and graphs) that are cognitively fluent for humans to apply and consistent with the original complex model, while know-that is concerned with gaining more in-depth understanding of the internal justification of the decisions through external constraints on accuracy, consistency, privacy, reliability, and fairness.
During the COVID-19 pandemic, committees have been appointed to make ethically difficult triage decisions, which are complicated by the diversity of stakeholder interests involved. We propose a disciplined, automated approach to support such difficult collective decision-making. Our system aims to recommend a policy to the group that strikes a compromise between potentially conflicting individual preferences. To identify a policy that best aggregates individual preferences, our system first elicits individual stakeholder value judgements by asking a moderate number of strategically selected queries, each taking the form of a pairwise comparison posed to a specific stakeholder. We propose a novel formulation of this problem that selects which queries to ask which individuals to best inform the downstream recommendation problem. Modeling this as a multi-stage robust optimization problem, we show that we can equivalently reformulate this as a mixed-integer linear program which can be solved with off-the-shelf solvers. We evaluate the performance of our approach on the problem of recommending policies for allocating critical care beds to patients with COVID-19. We show that asking questions intelligently allows the system to recommend a policy with a much lower regret than asking questions randomly. The lower regret suggests that the system is suited to help a committee reach a better decision by suggesting a policy that aligns with stakeholder value judgments.
The Google Trends data of some keywords have strong correlations with COVID-19 hospitalizations. We attempt to use these correlations and show an experimental procedure using a simple LSTM model to nowcast hospitalization peaks using Google Trends data. Experiments are done on French regions and on Belgium. This is a preliminary work, that would need to be tested during a (hopefully non-existing) second peak.
Social media has quickly grown into an essential tool for people to communicate and express their needs during crisis events. Prior work in analyzing social media data for crisis management has focused primarily on automatically identifying actionable (or, informative) crisis-related messages. In this work, we show that recent advances in Deep Learning and Natural Language Processing outperform prior approaches for the task of classifying informativeness and encourage the field to adopt them for their research or even deployment. We also extend these methods to two sub-tasks of informativeness and find that the Deep Learning methods are effective here as well.
In health care organizations, a patient’s privacy is threatened by the misuse of their electronic health record (EHR). To monitor privacy intrusions, logging systems are often deployed to trigger alerts whenever a suspicious access is detected. However, such mechanisms are insufficient in the face of small budgets, strategic attackers, and large false positive rates. In an attempt to resolve these problems, EHR systems are increasingly incorporating signaling, so that whenever a suspicious access request occurs, the system can, in real time, warn the user that the access may be audited. This gives rise to an online problem in which one needs to determine 1) whether a warning should be triggered and 2) the likelihood that the data request will be audited later. In this paper, we formalize this auditing problem as a Signaling Audit Game (SAG). A series of experiments with 10 million real access events (containing over 26K alerts) from Vanderbilt University Medical Center (VUMC) demonstrate that a strategic presentation of warnings adds value in that SAGs realize significantly higher utility for the auditor than systems without signaling.
Discharge summaries are essential for the transition of patients’ care but often lack sufficient information. We present an attention-based model to generate discharge summaries to support communication during the transition of care from intensive care units (ICU) to community care. We trained and evaluated our approach over 500, 000 clinical progress notes. The summaries automatically generated by our model achieve a ROUGE-L of 0.83 when comparing with discharge summaries written by health professionals. We attribute the high performance to our three-step pipeline that incorporates disease and specialist contexts to enrich the summaries with relevant information based on the context of the hospital stay. Additionally, we present a novel visualization of ICU flow of care using MIMIC-III. Our promising results have the potential to improve the pipeline of hospital discharge and continuous health care.
Research shows that providing an appliance-wise energy breakdown can help users save up to 15% of their energy bills. Non-intrusive load monitoring (NILM) or energy disaggregation is the task of estimating the household energy measured at the aggregate level for each constituent appliances in the household. The problem was first was introduced in the 1980s by Hart. Over the past three decades, NILM has been an extensively researched topic by researchers. NILMTK was introduced in 2014 to the NILM community in order to motivate reproducible research. Even after the introduction of the NILMTK toolkit to the community, there has been a little contribution of recent state-of-the-art algorithms back to the toolkit. In this paper, we propose a new disaggregation API, which further simplifies the process for the rapid comparison of different state-of-the-art algorithms across a wide range of datasets and algorithms. We also propose a new rewrite for writing the new disaggregation algorithms for NILMTK, which is similar to Scikitlearn. We demonstrate the power of the new API by conducting various complex experiments using the API.
COVID-19 Prevention, which combines the soft approaches and best practices for public health safety, is the only recommended solution from the health science and management society side considering the pandemic era. This process must be promoted via facilitation support to collective urban awareness programs through public dialogue and collective intelligence. Moreover, support must be provided throughout the process to perform complex public deliberation to find issues and ideas within existing approaches that can result in better approaches towards prevention. In an attempt to evaluate the validity of such claims in a conflict and COVID-19-affected country like Afghanistan, we conducted a large-scale digital social experiment using conversational AI and social platforms from an info-epidemiology and an info-veillance perspective. This served as a means to uncover an underling truth, give large-scale facilitation support, extend the soft impact of discussion to multiple sites, collect, diverge, converge and evaluate a large amount of opinions and concerns from health experts, patients and local people, deliberate on the data collected and explore collective prevention approaches of COVID-19. Finally, this paper shows that deciding a prevention measure that maximizes the probability of finding the ground truth is intrinsically difficult without utilizing the support of an AI-enabled discussion systems.
We propose a multi-armed bandit setting where each arm corresponds to a subpopulation, and pulling an arm is equivalent to granting an opportunity to this subpopulation. In this setting the decision-maker’s fairness policy governs the number of opportunities each subpopulation should receive, which typically depends on the (unknown) reward from granting an opportunity to this subpopulation. The decision-maker can decide whether to provide these opportunities or pay a predefined monetary value for every withheld opportunity. The decision-maker’s objective is to maximize her utility, which is the sum of rewards minus the cost of withheld opportunities. We provide a no-regret algorithm that maximizes the decisionmaker’s utility and complement our analysis with an almost-tight lower bound. Full version of the paper is available at https://tinyurl.com/y7s9avud.
Monitoring the effectiveness of policy interventions that promote sustainable farming practices has always been a costly affair. It requires an extensive ground presence which is not always available or reliable. In this paper we present our work so far in the application of deep learning techniques to automate the identification of individual parcels (farms). Our study area is located in the central state of Madhya Pradesh in India, where the average landholding size is around 0.6 hectares per farmer. We created a methodology that uses CNN models for segmentation and Canny Edge detector for generating contours. Our future work concentrates on improving the quality of the reference data and applying additional post-processing methods. Overall, we demonstrate how deep learning could be used for providing specific agronomic advice to individual farmers across large areas and the monitoring thereof, something which is essential in mitigating the effects of climate change.
Globally increasing migration pressures call for new modelling approaches in order to design effective policies. It is important to have not only efficient models to predict migration flows but also to understand how specific parameters influence these flows. In this paper, we propose an artificial neural network (ANN) to model international migration. Moreover, we use a technique for interpreting machine learning models, namely Partial Dependence Plots (PDP), to show that one can well study the effects of drivers behind international migration. We train and evaluate the model on a dataset containing annual international bilateral migration from 1960 to 2010 from 175 origin countries to 33 mainly OECD destinations, along with the main determinants as identified in the migration literature. The experiments carried out confirm that: 1) the ANN model is more efficient w.r.t. a traditional model, and 2) using PDP we are able to gain additional insights on the specific effects of the migration drivers. This approach provides much more information than only using the feature importance information used in previous works.
The main idea of the paper is that convolutional neural networks can be applied to very highresolution satellite imagery in order to classify New Delhi into formal (planned colony) vs. informal settlements (Jhuggi Jhopri Clusters). We show that very high-resolution satellite imagery along with convolutional neural networks can achieve high classification accuracy of 95.81%. We find that pretrained deep learning models for computer vision trained on standard image datasets can be effective for classification of informal settlements using satellite imagery, even when there is not a significant amount of training data. Deep learning models can learn image features without hand-crafted features and when coupled with the proliferation of cloud-based computer vision services could democratize the analysis of satellite imagery for humanitarian and developmental purposes.