Unbiased Prevalence Estimation with Multicalibrated LLMs

Replication materials for "Unbiased Prevalence Estimation with Multicalibrated LLMs."

Paper

The paper source is paper/paper.md (Markdown + Pandoc). To compile:

bash paper/compile.sh

This produces paper/paper.pdf and paper/si_appendix.pdf. Requires Pandoc with citeproc and a LaTeX distribution.

Repository Structure

├── paper/
│   ├── paper.md                 # Paper source
│   ├── si_appendix.md           # SI Appendix source
│   ├── references.bib           # Bibliography
│   ├── compile.sh               # Build script
│   └── images/                  # Figures (generated by scripts below)
├── simulation/
│   ├── run_simulation.py        # Reproduction script (Figure 1, SI Figs S2-S3)
│   └── helpers.py               # Simulation helper functions
├── acs_analysis/
│   ├── run_acs.py               # Reproduction script (Figure 2)
│   └── helpers.py               # ACS helper functions
├── cap_analysis/
│   ├── opus/
│   │   ├── run_cap_opus.py      # Reproduction script (Figure 3, SI Fig S4)
│   │   └── inference/
│   │       ├── claude_opus_inference.py  # Opus inference script
│   │       └── README.md                 # Inference documentation
│   ├── llama/
│   │   ├── run_llama.py         # Reproduction script (SI Appendix S2)
│   │   └── inference/
│   │       └── llm_inference.py # Llama 3.3 70B 2-stage verbalized confidence
│   ├── data/                    # Data files (see Data section below)
│   └── prepare_data.py          # Download & standardize CAP datasets
└── requirements.txt             # Python dependencies

Setup

conda create -n mcgrad_tutorials python=3.12 -y
conda activate mcgrad_tutorials
pip install -r requirements.txt

Reproducing Results

1. Simulation (Figure 1, SI Figures S2-S3)

No external data needed.

conda run -n mcgrad_tutorials python3 simulation/run_simulation.py

2. ACS Application (Figure 2)

Downloads ~500MB of Census data on first run via the folktables package.

conda run -n mcgrad_tutorials python3 acs_analysis/run_acs.py

3. CAP Application — Claude Opus (Figure 3, SI Figure S4)

Inference data is included in the repository. To reproduce the analysis:

conda run -n mcgrad_tutorials python3 cap_analysis/opus/run_cap_opus.py

To reproduce the LLM inference itself, see cap_analysis/opus/inference/README.md.

4. CAP Application — Llama Replication (SI Appendix S2)

The Llama replication requires running inference on a GPU (A100 80GB recommended). The inference script is cap_analysis/llama/inference/llm_inference.py. To run the downstream analysis on the included scores:

conda run -n mcgrad_tutorials python3 cap_analysis/llama/run_llama.py

All scripts save figures to paper/images/.

Data

The CAP analysis uses data from the Comparative Agendas Project. To download and prepare the raw data:

python cap_analysis/prepare_data.py

Claude Opus inference results (binary classifications and probability scores for 30K documents) are included in the repository under cap_analysis/data/inference_output/. The 30K stratified sample is at cap_analysis/data/opus_30k_sample.csv.

ACS data is downloaded automatically by the folktables package on first run.

Software

MCGrad — multicalibration algorithm
Python 3.12, see requirements.txt for full dependencies

License

The materials in this repository are released under the Creative Commons Attribution-NonCommercial (CC BY-NC) license. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unbiased Prevalence Estimation with Multicalibrated LLMs

Paper

Repository Structure

Setup

Reproducing Results

1. Simulation (Figure 1, SI Figures S2-S3)

2. ACS Application (Figure 2)

3. CAP Application — Claude Opus (Figure 3, SI Figure S4)

4. CAP Application — Llama Replication (SI Appendix S2)

Data

Software

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
acs_analysis		acs_analysis
cap_analysis		cap_analysis
paper		paper
simulation		simulation
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Unbiased Prevalence Estimation with Multicalibrated LLMs

Paper

Repository Structure

Setup

Reproducing Results

1. Simulation (Figure 1, SI Figures S2-S3)

2. ACS Application (Figure 2)

3. CAP Application — Claude Opus (Figure 3, SI Figure S4)

4. CAP Application — Llama Replication (SI Appendix S2)

Data

Software

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages