Skip to content

facebookresearch/multicalibrated_llm_measurement

Unbiased Prevalence Estimation with Multicalibrated LLMs

Replication materials for "Unbiased Prevalence Estimation with Multicalibrated LLMs."

Paper

The paper source is paper/paper.md (Markdown + Pandoc). To compile:

bash paper/compile.sh

This produces paper/paper.pdf and paper/si_appendix.pdf. Requires Pandoc with citeproc and a LaTeX distribution.

Repository Structure

├── paper/
│   ├── paper.md                 # Paper source
│   ├── si_appendix.md           # SI Appendix source
│   ├── references.bib           # Bibliography
│   ├── compile.sh               # Build script
│   └── images/                  # Figures (generated by scripts below)
├── simulation/
│   ├── run_simulation.py        # Reproduction script (Figure 1, SI Figs S2-S3)
│   └── helpers.py               # Simulation helper functions
├── acs_analysis/
│   ├── run_acs.py               # Reproduction script (Figure 2)
│   └── helpers.py               # ACS helper functions
├── cap_analysis/
│   ├── opus/
│   │   ├── run_cap_opus.py      # Reproduction script (Figure 3, SI Fig S4)
│   │   └── inference/
│   │       ├── claude_opus_inference.py  # Opus inference script
│   │       └── README.md                 # Inference documentation
│   ├── llama/
│   │   ├── run_llama.py         # Reproduction script (SI Appendix S2)
│   │   └── inference/
│   │       └── llm_inference.py # Llama 3.3 70B 2-stage verbalized confidence
│   ├── data/                    # Data files (see Data section below)
│   └── prepare_data.py          # Download & standardize CAP datasets
└── requirements.txt             # Python dependencies

Setup

conda create -n mcgrad_tutorials python=3.12 -y
conda activate mcgrad_tutorials
pip install -r requirements.txt

Reproducing Results

1. Simulation (Figure 1, SI Figures S2-S3)

No external data needed.

conda run -n mcgrad_tutorials python3 simulation/run_simulation.py

2. ACS Application (Figure 2)

Downloads ~500MB of Census data on first run via the folktables package.

conda run -n mcgrad_tutorials python3 acs_analysis/run_acs.py

3. CAP Application — Claude Opus (Figure 3, SI Figure S4)

Inference data is included in the repository. To reproduce the analysis:

conda run -n mcgrad_tutorials python3 cap_analysis/opus/run_cap_opus.py

To reproduce the LLM inference itself, see cap_analysis/opus/inference/README.md.

4. CAP Application — Llama Replication (SI Appendix S2)

The Llama replication requires running inference on a GPU (A100 80GB recommended). The inference script is cap_analysis/llama/inference/llm_inference.py. To run the downstream analysis on the included scores:

conda run -n mcgrad_tutorials python3 cap_analysis/llama/run_llama.py

All scripts save figures to paper/images/.

Data

The CAP analysis uses data from the Comparative Agendas Project. To download and prepare the raw data:

python cap_analysis/prepare_data.py

Claude Opus inference results (binary classifications and probability scores for 30K documents) are included in the repository under cap_analysis/data/inference_output/. The 30K stratified sample is at cap_analysis/data/opus_30k_sample.csv.

ACS data is downloaded automatically by the folktables package on first run.

Software

  • MCGrad — multicalibration algorithm
  • Python 3.12, see requirements.txt for full dependencies

License

The materials in this repository are released under the Creative Commons Attribution-NonCommercial (CC BY-NC) license. See the LICENSE file.

About

This repository contains replication materials for the paper "Unbiased Prevalence Estimation with Multicalibrated LLMs"

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors