Privacy and Security

As private and sensitive data are increasingly collected and handled by computer systems, and as computer systems increasingly pervade our society, the privacy of personal data and the security of computer systems is critical to the smooth functioning of society. Many CRCS members have strong interests in privacy and security, and active research projects ranging from privacy tools for sharing research data, to the use of programming languages to define and enforce system security, to designing the new tools and techniques required for the Internet of Things.

Some of these projects are described below.

Privacy Tools for Sharing Research Data

Project website: http://privacytools.seas.harvard.edu/

Information technology, advances in statistical computing, and the deluge of data available through the Internet are transforming social science. With the ability to collect and analyze massive amounts of data on human behavior and interactions, social scientists can hope to uncover many more phenomena, with greater detail and confidence, than allowed by traditional means such as surveys and interviews. In addition to advancing the state of knowledge, the rich analysis of behavioral data can enable companies to better serve their customers, and governments their citizenry.

However, a major challenge for computational social science is maintaining the privacy of human subjects. At present, an individual social science researcher is left to devise her own privacy shields, such as stripping the dataset of “personally identifiable information” (PII). However, such privacy shields are often ineffective and provide limited (or no) real-world privacy protection. Indeed, there have been a number of cases where the individuals in a supposedly anonymized dataset have been re-identified. At the same time, social scientists are increasingly analyzing complex forms of data, such as large social networks, spatial trajectories, and semistructured text, that are even less amenable to naive attempts at anonymization.

This project is a broad, multidisciplinary effort to enable the collection, analysis, and sharing of social science data while providing sufficient privacy for individual subjects. Bringing together computer science, social science, statistics, and law, the investigators seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for social scientists to use when dealing with sensitive data.

These tools will be tested and deployed at the Harvard Institute for Quantitative Social Science’s Dataverse Network, an open-source digital repository that offers the largest catalogue of social science datasets in the world. Our aim is to provide social scientists with a technological and legal framework that embodies the modern computational understanding of privacy, and a reliable open infrastructure that aids in the management of confidential research data from collection through dissemination.

Language-Based Security and Privacy

The focus of the Language-Based Security group is on the development of technology that helps detect or prevent implementation flaws in security and privacy-critical software. Specifically, we are investigating the role of advanced type systems for tracking information flow in programs; new compiler techniques and hardware designs for enforcing security policies in an efficient fashion; new languages for specification of policies; and new techniques for minimizing the need to trust software and hardware.