Replication materials for "Unbiased Prevalence Estimation with Multicalibrated LLMs."
The paper source is paper/paper.md (Markdown + Pandoc). To compile:
bash paper/compile.shThis produces paper/paper.pdf and paper/si_appendix.pdf. Requires
Pandoc with citeproc and a LaTeX
distribution.
├── paper/
│ ├── paper.md # Paper source
│ ├── si_appendix.md # SI Appendix source
│ ├── references.bib # Bibliography
│ ├── compile.sh # Build script
│ └── images/ # Figures (generated by scripts below)
├── simulation/
│ ├── run_simulation.py # Reproduction script (Figure 1, SI Figs S2-S3)
│ └── helpers.py # Simulation helper functions
├── acs_analysis/
│ ├── run_acs.py # Reproduction script (Figure 2)
│ └── helpers.py # ACS helper functions
├── cap_analysis/
│ ├── opus/
│ │ ├── run_cap_opus.py # Reproduction script (Figure 3, SI Fig S4)
│ │ └── inference/
│ │ ├── claude_opus_inference.py # Opus inference script
│ │ └── README.md # Inference documentation
│ ├── llama/
│ │ ├── run_llama.py # Reproduction script (SI Appendix S2)
│ │ └── inference/
│ │ └── llm_inference.py # Llama 3.3 70B 2-stage verbalized confidence
│ ├── data/ # Data files (see Data section below)
│ └── prepare_data.py # Download & standardize CAP datasets
└── requirements.txt # Python dependencies
conda create -n mcgrad_tutorials python=3.12 -y
conda activate mcgrad_tutorials
pip install -r requirements.txtNo external data needed.
conda run -n mcgrad_tutorials python3 simulation/run_simulation.pyDownloads ~500MB of Census data on first run via the folktables package.
conda run -n mcgrad_tutorials python3 acs_analysis/run_acs.pyInference data is included in the repository. To reproduce the analysis:
conda run -n mcgrad_tutorials python3 cap_analysis/opus/run_cap_opus.pyTo reproduce the LLM inference itself, see cap_analysis/opus/inference/README.md.
The Llama replication requires running inference on a GPU (A100 80GB recommended).
The inference script is cap_analysis/llama/inference/llm_inference.py. To run the
downstream analysis on the included scores:
conda run -n mcgrad_tutorials python3 cap_analysis/llama/run_llama.pyAll scripts save figures to paper/images/.
The CAP analysis uses data from the Comparative Agendas Project. To download and prepare the raw data:
python cap_analysis/prepare_data.pyClaude Opus inference results (binary classifications and probability scores for
30K documents) are included in the repository under
cap_analysis/data/inference_output/. The 30K stratified sample is at
cap_analysis/data/opus_30k_sample.csv.
ACS data is downloaded automatically by the folktables package on first run.
- MCGrad — multicalibration algorithm
- Python 3.12, see
requirements.txtfor full dependencies
The materials in this repository are released under the Creative Commons Attribution-NonCommercial (CC BY-NC) license. See the LICENSE file.