177 Huntington Avenue
Boston, MA 02115
ATTN: David Lazer, 1010-177
360 Huntington Avenue
Boston, MA 02115
Computational social science, network science, collective cognition, political networks, social influence in networks, social media, deliberative democracy, predictive modeling.
- PhD in Political Science, University of Michigan
- BA in Economics, Wesleyan University
David Lazer is a Distinguished Professor of Political Science and Computer and Information Science, Northeastern University, and co-director for the NULab for Texts, Maps, and Networks. Prior to joining Northeastern University, he was on the faculty at the Harvard Kennedy School (1998-2009).
His research focuses on the nexus of network science, computational social science, and collaborative intelligence. He is the founder of the citizen science website Volunteer Science and the political visualization website VisPolics. His research has been published in such journals as Science, Proceedings of the National Academy of Science, the American Political Science Review, and the Administrative Science Quarterly, and has received extensive coverage in the media, including the New York Times, NPR, the Washington Post, the Wall Street Journal, and CBS Evening News.
Professor Lazer serves in multiple leadership and editorial positions, including as a board member for the International Network of Social Network Analysts (INSNA), reviewing editor for Science, associate editor of Social Networks and Network Science, numerous other editorial boards and program committees. He was a founder of the Political Networks Section, as well as a founder and founding host of the Political Networks conference.
R. Epstein, R. Robertson, D. Lazer, C. Wilson, “Suppressing the Search Engine Manipulation Effect (SEME). Proceedings of the ACM on Human Computer Interaction, November, 2017
Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. Here we report the results of five relevant double-blind, randomized controlled experiments, using a total of 4,556 undecided voters representing diverse demographic characteristics of the voting populations of the United States and India. The fifth experiment is especially notable in that it was conducted with eligible voters throughout India in the midst of India’s 2014 Lok Sabha elections just before the final votes were cast. The results of these experiments demonstrate that (i) biased search rankings can shift the voting preferences of undecided voters by 20% or more, (ii) the shift can be much higher in some demographic groups, and (iii) search ranking bias can be masked so that people show no awareness of the manipulation. We call this type of influence, which might be applicable to a variety of attitudes and beliefs, the search engine manipulation effect. Given that many elections are won by small margins, our results suggest that a search engine company has the power to influence the results of a substantial number of elections with impunity. The impact of such manipulations would be especially large in countries dominated by a single search engine company.
B. Jasny, N. Wigginton, M. McNutt, T. Bubela, S. Buck, R. Cook-Deegan, T. Gardner, B. Hanson, C. Hustad, V. Kiermer, and D. Lazer. “Fostering reproducibility in industry-academia research,” Science, 2017
Many companies have proprietary resources and/or data that are indispensable for research, and academics provide the creative fuel for much early-stage research that leads to industrial innovation. It is essential to the health of the research enterprise that collaborations between industrial and university researchers flourish. This system of collaboration is under strain. Financial motivations driving product development have led to concerns that industry-sponsored research comes at the expense of transparency . Yet many industry researchers distrust quality control in academia and question whether academics value reproducibility as much as rapid publication. Cultural differences between industry and academia can create or increase difficulties in reproducing research findings. We discuss key aspects of this problem that industry-academia collaborations must address and for which other stakeholders, from funding agencies to journals, can provide leadership and support.
M. Neblo, W. Minozzi, K. Esterling, J. Kingzette, J. Green, D. Lazer, “The need for a translational science of democracy,” Science, 2017
The bitterly factious 2016 U.S. presidential election campaign was the culmination of several trends that, taken together, constitute a syndrome of chronic ailments in the body politic. Ironically, these destructive trends have accelerated just as science has rapidly improved our understanding of them and their underlying causes. But mere understanding is not sufficient to repair our politics. The challenge is to build a translational science of democracy that maintains scientific rigor while actively promoting the health of the body politic.
DS: D. Lazer, J. Radford. “Introduction to Big Data,” Annual Review of Sociology 43, no. 1, 2017
Social life increasingly occurs in digital environments and continues to be mediated by digital systems. Big data represents the data being generated by the digitization of social life, which we break down into three domains: digital life, digital traces, and digitalized life. We argue that there is enormous potential in using big data to study a variety of phenomena that remain difficult to observe. However, there are some recurring vulnerabilities that should be addressed. We also outline the role institutions must play in clarifying the ethical rules of the road. Finally, we conclude by pointing to a number of nascent but important trends in the use of big data.
J. Radford, A. Pilny, A. Reichelmann, B. Keegan, B. F. Welles, J. Hoye, K. Ognyanova, W. Meleis, D. Lazer, “Volunteer Science: An Online Laboratory for Experiments in Social Psychology, Social Psychology Quarterly, 2016
Experimental research in traditional laboratories comes at a significant logistic and financial cost while drawing data from demographically narrow populations. The growth of online methods of research has resulted in effective means for social psychologists to collect large-scale survey-based data in a cost-effective and timely manner. However, the same advancement has not occurred for social psychologists who rely on experimentation as their primary method of data collection. The aim of this article is to provide an overview of one online laboratory for conducting experiments, Volunteer Science, and report the results of six studies that test canonical behaviors commonly captured in social psychological experiments. Our results show that the online laboratory is capable of performing a variety of studies with large numbers of diverse volunteers. We advocate for the use of the online laboratory as a valid and cost-effective way to perform social psychological experiments with large numbers of diverse subjects.
K. Joseph, L. Friedland, O. Tsur, W. Hobbs, and D. Lazer. “ConStance: Modeling Annotation Contexts to Improve Stance Classification.” Empirical Methods in Natural Language Processing (EMNLP) 2017
Manual annotations are a prerequisite for many applications of machine learning. However, weaknesses in the annotation process itself are easy to overlook. In particular, scholars often choose what information to give to annotators without examining these decisions empirically. For subjective tasks such as sentiment analysis, sarcasm, and stance detection, such choices can impact results. Here, for the task of political stance detection on Twitter, we show that providing too little context can result in noisy and uncertain annotations, whereas providing too strong a context may cause it to outweigh other signals. To characterize and reduce these biases, we develop ConStance, a general model for reasoning about annotations across information conditions. Given conflicting labels produced by multiple annotators seeing the same instances with different contexts, ConStance simultaneously estimates gold standard labels and also learns a classifier for new instances. We show that the classifier learned by ConStance outperforms a variety of baselines at predicting political stance, while the model’s interpretable parameters shed light on the effects of each context.
W. Hobbs, L. Friedland, K. Joseph, O. Tsur, S. Wojcik, D. and Lazer, “Voters of the Year”: 19 Voters Who Were Unintentional Election Poll Sensors on Twitter. In ICWSM 2017
Public opinion and election prediction models based on social media typically aggregate, weight, and average signals from a massive number of users. Here, we analyze political attention and poll movements to identify a small number of social “sensors” — individuals whose levels of social media discussion of the major parties’ candidates characterized the candidates’ ups and downs over the 2016 U.S. presidential election campaign. Starting with a sample of approximately 22,000 accounts on Twitter that we linked to voter registration records, we used penalized regressions to identify a set of 19 accounts (sensors) that were predictive for the candidates’ poll numbers (5 for Hillary Clinton, 13 for Donald Trump, and 1 for both). The predictions based on the activity of these handfuls of sensors accurately tracked later movements in poll margins. Despite the regressions allowing both supportive and opposition sensors, our separate models for Trump and Clinton poll support identified sensors for Hillary Clinton who were disproportionately women and for Donald Trump who were disproportionately white. The method did not predict changes in levels of undecideds and underestimated support for Donald Trump in September 2016, where the errors were correlated with discussions of protests of police shootings.
DS: W. Wang, R. Kennedy, D. Lazer, N. Ramakrishnan. “Growing pains for global monitoring of societal events,” Science, 2016
There have been serious efforts over the past 40 years to use newspaper articles to create global-scale databases of events occurring in every corner of the world, to help understand and shape responses to global problems. Although most have been limited by the technology of the time (1) [see supplementary materials (SM)], two recent groundbreaking projects to provide global, real-time “event data” that take advantage of automated coding from news media have gained widespread recognition: International Crisis Early Warning System (ICEWS), maintained by Lockheed Martin, and Global Data on Events Language and Tone (GDELT), developed and maintained by Kalev Leetaru at Georgetown University (2, 3). The scale of these programs is unprecedented, and their promise has been reflected in the attention they have received from scholars, media, and governments. However, they suffer from major issues with respect to reliability and validity. Opportunities exist to use new methods and to develop an infrastructure that will yield robust and reliable “big data” to study global events—from conflict to ecological change
Uncovering Social Semantics from Textual Traces: A Theory-Driven Approach and Evidence from Public Statements of U.S. Members of Congress
Y. Lin, D. Margolin, D. Lazer, “Uncovering Social Semantics from Textual Traces: A Theory-Driven Approach and Evidence from Public Statements of U.S. Members of Congress,” Journal of the Association for Information Science and Technology, 2015
The increasing abundance of digital textual archives provides an opportunity for understanding human social systems. Yet the literature has not adequately considered the disparate social processes by which texts are produced. Drawing on communication theory, we identify three common processes by which documents might be detectably similar in their textual features-authors sharing subject matter, sharing goals, and sharing sources. We hypothesize that these processes produce distinct, detectable relationships between authors in different kinds of textual overlap. We develop a novel n-gram extraction technique to capture such signatures based on n-grams of different lengths. We test the hypothesis on a corpus where the author attributes are observable: the public statements of the members of the U.S. Congress. This article presents the first empirical finding that shows different social relationships are detectable through the structure of overlapping textual features. Our study has important implications for designing text modeling techniques to make sense of social phenomena from aggregate digital traces
D. Lazer, “The Rise of the Social Algorithm,” Science, 2015
Humanity is in the early stages of the rise of social algorithms: programs that size us up, evaluate what we want, and provide a customized experience. This quiet but epic paradigm shift is fraught with social and policy implications. The evolution of Google exemplifies this shift. It began as a simple deterministic ranking system based on the linkage structure among Web sites—the model of algorithmic Fordism, where any color was fine as long as it was black (1). The current Google is a very different product, personalizing results (2) on the basis of information about past searches and other contextual information, like location. On page 1130 of this issue, Bakshy et al. (3) explore whether such personalized curation on Facebook prevents users from accessing posts presenting conflicting political views.
J. Toole, Y. Lin, E. Muehlegger, D. Shoag, M. Gonzalez, D. Lazer, “Tracking employment shocks using mobile phone data,” Journal of the Royal Society Interface, 2015
Can data from mobile phones be used to observe economic shocks and their consequences at multiple scales? Here we present novel methods to detect mass layoffs, identify individuals affected by them and predict changes in aggregate unemployment rates using call detail records (CDRs) from mobile phones. Using the closure of a large manufacturing plant as a case study, we first describe a structural break model to correctly detect the date of a mass layoff and estimate its size. We then use a Bayesian classification model to identify affected individuals by observing changes in calling behaviour following the plant’s closure. For these affected individuals, we observe significant declines in social behaviour and mobility following job loss. Using the features identified at the micro level, we show that the same changes in these calling behaviours, aggregated at the regional level, can improve forecasts of macro unemployment rates. These methods and results highlight promise of new data resources to measure microeconomic behaviour and improve estimates of critical economic indicators.
C. Kliman-Silver, A. Hannak, D. Lazer, C. Wilson, & A. Mislove, “Location, Location, Location: The Impact of Geolocation on Web Search Personalization”, Internet Measurement Conference (IMC ’15), Tokyo, 2015
To cope with the immense amount of content on the web, search engines often use complex algorithms to personalize search results for individual users. However, personalization of search results has led to worries about the Filter Bubble Effect, where the personalization algorithm decides that some useful information is irrelevant to the user, and thus prevents them from locating it. In this paper, we propose a novel methodology to explore the impact of location-based personalization on Google Search results. Assessing the relationship between location and personalization is crucial, since users’ geolocation can be used as a proxy for other demographic traits, like race, income, educational attainment, and political affiliation. In other words, does location-based personalization trap users in geolocal Filter Bubbles?
Using our methodology, we collected 30 days of search results from Google Search in response to 240 different queries. By comparing search results gathered from 59 GPS coordinates around the US at three different granularities (county, state, and national), we are able to observe that differences in search results due to personalization grow as physical distance increases. However these differences are highly dependent on what a user searches for: queries for local establishments receive 4-5 different results per page, while more general terms exhibit essentially no personalization.
O. Tsur, D. Calacci, D. Lazer, “A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda Setting Campaigns,” Association for Computational Linguistics (ACL ’15), Beijing, 2015
Framing is a sophisticated form of discourse in which the speaker tries to induce a cognitive bias through consistent linkage between a topic and a specific context (frame). We build on political science and communication theory and use probabilistic topic models combined with time series regression analysis (autoregressive distributed-lag models) to gain insights about the language dynamics in the political processes. Processing four years of public statements issued by members of the U.S. Congress, our results provide a glimpse into the complex dynamic processes of framing, attention shifts and agenda setting, commonly known as ‘spin’. We further provide new evidence for the divergence in party discipline in U.S. politics.