#  Reagan Mozer (Bentley University) 

 



####  calendar\_today Date and Time 

 **March 25, 2026** 

 12:00PM - 01:30PM EDT 

####  pin\_drop Location 

 **CGIS Knafel Building, Room K354**  



 

 [ Join via Zoom arrow\_circle\_right ](https://harvard.zoom.us/j/93110218231?pwd=Gmka2cTdUty8AcWec90hWmcSllXtkP.1) 

 



 

### Title

Stratified Sampling for Model-Assisted Estimation with Surrogate Outcomes

### Abstract

In many randomized trials, outcomes such as essays or open-ended responses must be manually scored before impact analysis, a process that is costly and limiting. Model-assisted estimation combines surrogate outcomes from machine learning or large language models with a human-coded subset to obtain unbiased estimates, but existing approaches rely on simple random sampling and ignore systematic structure in prediction errors. We extend this framework by incorporating stratified sampling to more efficiently allocate human coding effort. We derive the exact variance of the stratified estimator, characterize conditions under which stratification improves precision, and identify a Neyman-type optimal allocation rule that oversamples strata with larger residual variance. Comprehensive simulation studies confirm that stratification consistently improves efficiency when surrogate prediction errors exhibit structured bias or heteroskedasticity. We present two empirical applications, including an education RCT and a large observational corpus, to illustrate practical implementation using ChatGPT-generated surrogate outcomes. Overall, this framework provides a practical design-based approach for leveraging surrogate outcomes and strategically allocating human coding effort to obtain unbiased estimates with greater efficiency. While motivated by text-as-data applications, the methodology applies broadly to any setting where outcome measurement is costly, including both group comparisons and single-group mean estimation.



 

 



 

 

 Share on:- [     Facebook ](#)
- [     Twitter ](#)
- [     Linkedin ](#)
 


 Save: [ Add to calendar calendar\_today ](https://appliedstatsworkshopgov3009.hsites.harvard.edu/node/1337141/event-feed.ics)  Copy link link