- About
- Courses
- Research
- Computational Social Science
- Critical Data Studies
- Data Science
- Economics and Information
- Education Technology
- Ethics, Law and Policy
- Human-Computer Interaction
- Human-Robot Interaction
- Incentives and Computation
- Infrastructure Studies
- Interface Design and Ubiquitous Computing
- Natural Language Processing
- Network Science
- Social Computing and Computer-supported Cooperative Work
- Technology and Equity
- People
- Career
- Undergraduate
- Info Sci Majors
- BA - Information Science (College of Arts & Sciences)
- BS - Information Science (CALS)
- BS - Information Science, Systems, and Technology
- MPS Early Credit Option
- Independent Research
- CPT Procedures
- Student Associations
- Undergraduate Minor in Info Sci
- Our Students and Alumni
- Graduation Info
- Contact Us
- Info Sci Majors
- Masters
- PHD
- Prospective PhD Students
- Admissions
- Degree Requirements and Curriculum
- Grad Student Orgs
- For Current PhDs
- Diversity and Inclusion
- Our Students and Alumni
- Graduation Info
- Program Contacts and Student Advising
Speaker: Brendan O’Connor, 5th year Ph.D. candidate, Carnegie Mellon University's Machine Learning Department
Talk Title: Statistical Text Analysis for Social Science
Abstract: What can text analysis tell us about society? Corpora of news, books, and social media encode human beliefs and culture. But it is impossible for a researcher to read all of today's rapidly growing text archives. My research develops statistical text analysis methods that measure social phenomena from textual content, especially in news and social media data. For example: How do changes to public opinion appear in microblogs? What topics get censored in the Chinese Internet? What character archetypes recur in movie plots? How do geography and ethnicity affect the diffusion of new language? In order to answer these questions effectively, we must apply and develop scientific methods in statistics, computation, and linguistics.
In this talk I will illustrate these methods in a project that analyzes events in international politics. Political scientists are interested in studying international relations through *event data*: time series records of who did what to whom, as described in news articles. To address this event extraction problem, we develop an unsupervised hierarchical model of semantic event classes, in which we use Bayesian methods to learn the verbs and textual descriptions that correspond to types of diplomatic and military interactions between countries. The model uses dynamic logistic normal priors to drive the learning of semantic classes; but unlike a topic model, it leverages deeper linguistic analysis of syntactic argument structure. Using a corpus of several million news articles over 15 years, we evaluate whether the method matches expert judgments, or corresponds to real-world conflict. The method also supports exploratory analysis; for example, of the recent history of Israeli-Palestinian relations.
Bio: Brendan O'Connor is a 5th year Ph.D. candidate in Carnegie Mellon University's Machine Learning Department. He is interested in statistical machine learning and natural language processing, especially when informed by or applied to the social sciences. In the past he has been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, an intern in the Facebook Data Science group, and has worked on crowdsourcing (Crowdflower/Dolores Labs) and "semantic" search (Powerset). His undergraduate degree was Symbolic Systems.