Rohit Bhattacharya (Williams College)
Date and Time
Location
Title
Opportunities for Principled Use of AI for Causal Inference
Abstract
Modern causal inference theory has advanced significantly in handling challenges that underlie observational data. Most of this theory, however, assumes clean, structured data, whereas real-world data sources like electronic health records contain a mix of both structured measurements and unstructured information (e.g., clinical notes.) In this context, state-of-the-art machine learning methods—particularly generative AI models trained on unstructured data—offer new opportunities for causal inference. While these tools are trained purely on associational tasks, I argue for some principled approaches to incorporating them into causal pipelines. Time permitting, I will present two concrete examples of this. The first concerns settings where information about unobserved confounding is captured in unstructured text data. The proposed method uses zero-shot models (e.g., large language models) to infer proxies from multiple instances of pre-treatment text and plugs them into the so-called proximal g-formula. I also briefly describe falsification heuristics and opportunities for sensitivity analysis for this method. The second example concerns settings involving high-dimensional treatments in the context of computational genomics. In this context, I describe how AlphaFold can be used as a form of interpretable dimension reduction for complex interventional queries involving mutations in protein sequences.