Text as Data: Measurement and Inference Issues with Text Data with Dr. Le Bao
Abstract: Text as data has become a transformative approach of producing insights about human behavior and society. How do we use text as data? This workshop provides an overview of different applications of text as data. From constructing variables using text to employing Large Language Models (LLM) to scale variables, we will discuss the strengths and weaknesses of using text as data in the contexts of measurement, statistical models, and causal inference. This workshop serves as an introductory session for the other fall MDI Data Workshops that will focus on specific machine learning and natural language processing (NLP) techniques. Basic familiarity with programming and statistical methods is expected. No NLP background is required.
Bio: Dr. Le Bao is a Postdoctoral Fellow at the Massive Data Institute working with the Environmental Impact Data Collaborative (EIDC) team and the Measuring Online Social Attitudes and Information Collaborative (MOSAIC) team. His research focuses on geospatial methods, Bayesian statistics, and survey research. Dr. Bao received his Ph.D. in political science from American University. He was also previously a Visiting Fellow at the Institute for Quantitative Social Science, Harvard University.