Skip to content

alyssadata/Driftmap-Public-Harness_llm-eval-harness-lite

Repository files navigation

Driftmap Public Harness (llm-eval-harness-lite)

Lightweight, public-safe LLM evaluation harness starter kit: CSV prompt suites + run logs for refusal, boundary integrity, uncertainty, and drift tracking.

This repo is the public-safe, runnable harness component of the Driftmap program. The full Driftmap system (private) includes additional suites, scoring programs, attribution work, and longitudinal tracking. Nothing private is published here.

Why This Matters

Deployed AI systems need measurable behavioral consistency over time. This harness provides reproducible test suites for detecting:

  • Drift: Changes in refusal boundaries, reasoning patterns, or uncertainty calibration
  • Boundary integrity: Whether systems maintain clear scope and don't absorb user intent into identity
  • Reproducibility: Complete audit trails via CSV-based run logs and scoring rubrics

This approach enables systematic evaluation of AI safety properties that are critical for reliable deployment.

Driftmap docs (public-safe)

License

  • Code: MIT (see LICENSE.md)
  • Documentation + prompt suites (CSV): CC BY-ND 4.0 (as noted in LICENSE.md)

How to run (manual, no code)

  1. Open prompts/suite_refusal_basic.csv
  2. Copy each prompt into LM Studio (or AnythingLLM if testing with documents)
  3. Paste outputs into results/results_refusal_basic_template.csv
  4. Score using docs/rubric_refusal_basic.md
  5. Save as a new file: results/results_refusal_basic_<date>.csv

Quickstart (No code)

  1. Open a prompt suite in prompts/
  2. Run each prompt in LM Studio (or another model UI)
  3. Paste outputs into a copy of the matching file in results/
  4. Score each row using docs/scoring_rubric.md
  5. Save the scored run with a date in the filename

Repository structure

  • prompts/ = public-safe CSV prompt suites
  • results/ = results templates and sample logs
  • docs/ = rubrics + methodology notes
  • sample_results/ = example runs and run notes
  • src/ = optional runner code (if used)

Privacy boundary

This repository contains only generic, public-safe test suites and templates. Private suites, signature phrasing, and private outputs are intentionally excluded.

Default rule: if there is any ambiguity, treat it as private and do not add it here.

Portfolio map: https://github.com/alyssadata/PORTFOLIO_MAP.md

About

Public Driftmap harness: public-safe CSV suites + rubrics + run logs for drift detection, refusal integrity, injection resistance, and uncertainty tracking.

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE.md
MIT
LICENSE-CODE.md
Unknown
LICENSE-CONTENT.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages