Lightweight, public-safe LLM evaluation harness starter kit: CSV prompt suites + run logs for refusal, boundary integrity, uncertainty, and drift tracking.
This repo is the public-safe, runnable harness component of the Driftmap program. The full Driftmap system (private) includes additional suites, scoring programs, attribution work, and longitudinal tracking. Nothing private is published here.
Deployed AI systems need measurable behavioral consistency over time. This harness provides reproducible test suites for detecting:
- Drift: Changes in refusal boundaries, reasoning patterns, or uncertainty calibration
- Boundary integrity: Whether systems maintain clear scope and don't absorb user intent into identity
- Reproducibility: Complete audit trails via CSV-based run logs and scoring rubrics
This approach enables systematic evaluation of AI safety properties that are critical for reliable deployment.
- Drift taxonomy: docs/drift_taxonomy.md
- Metrics definitions: docs/metrics_definitions_driftmap.md
- Run log schema: docs/run_log_schema_driftmap.md
- Code: MIT (see LICENSE.md)
- Documentation + prompt suites (CSV): CC BY-ND 4.0 (as noted in LICENSE.md)
- Open prompts/suite_refusal_basic.csv
- Copy each prompt into LM Studio (or AnythingLLM if testing with documents)
- Paste outputs into results/results_refusal_basic_template.csv
- Score using docs/rubric_refusal_basic.md
- Save as a new file:
results/results_refusal_basic_<date>.csv
- Open a prompt suite in prompts/
- Run each prompt in LM Studio (or another model UI)
- Paste outputs into a copy of the matching file in results/
- Score each row using docs/scoring_rubric.md
- Save the scored run with a date in the filename
- prompts/ = public-safe CSV prompt suites
- results/ = results templates and sample logs
- docs/ = rubrics + methodology notes
- sample_results/ = example runs and run notes
- src/ = optional runner code (if used)
This repository contains only generic, public-safe test suites and templates. Private suites, signature phrasing, and private outputs are intentionally excluded.
Default rule: if there is any ambiguity, treat it as private and do not add it here.
Portfolio map: https://github.com/alyssadata/PORTFOLIO_MAP.md