Skip to content

Latest commit

 

History

History
175 lines (121 loc) · 3.52 KB

File metadata and controls

175 lines (121 loc) · 3.52 KB

StillMe Case Study Template (Before vs After Validation)

Use this template to produce audit-grade evidence for StillMe Lite performance.

Goal: show measurable improvement after enabling verification behavior.


1) Case Study Metadata

  • Case ID:
  • Date:
  • Owner:
  • Domain (research/support/policy/other):
  • Language:
  • LLM model:
  • Retrieval setup:
  • StillMe Lite mode (monitor|warn|enforce):
  • Policy file version:

2) Test Scope

  • Total prompts:
  • Prompt set source:
  • Prompt categories:
    • Factual query
    • Source-required query
    • Ambiguous query
    • Adversarial/prompt-injection

Rules:

  • Keep prompt set fixed between before/after runs.
  • Use same model and retrieval settings for fair comparison.
  • Store raw runs in JSONL for reproducibility.

3) Experimental Setup

3.1 Before (Baseline)

  • Validation disabled or bypassed.
  • Record model outputs and available context.

3.2 After (StillMe Enabled)

  • Validation enabled with selected mode.
  • Record decision, reason codes, and safe response.

3.3 Data Files

  • data/before_<case_id>.jsonl
  • data/after_<case_id>.jsonl
  • reports/<case_id>_summary.md

4) Mandatory Metrics

4.1 Hallucination Escape Rate

Definition: unsupported_factual_answers_passed / total_high_risk_factual_prompts

Interpretation:

  • Lower is better.
  • This is the primary risk metric.

4.2 Refusal Precision

Definition: correct_refusals / total_refusals

Interpretation:

  • Higher is better.
  • Measures whether refusals are justified instead of over-blocking.

4.3 Source Coverage

Definition: factual_answers_with_valid_citation / total_factual_answers

Interpretation:

  • Higher is better.
  • Measures evidence grounding quality.

5) Optional Supporting Metrics

  • False refusal rate
  • Clarification usefulness rate
  • Mean validator confidence band distribution
  • Decision latency delta (before vs after)

6) Output Tables

6.1 Core Metrics Table

Metric Before After Delta Target
Hallucination escape rate lower
Refusal precision >= 0.85
Source coverage >= 0.80

6.2 Decision Distribution

Decision Before After
answer
refuse
ask_clarify

7) Error Analysis (Required)

Provide 5-10 representative examples:

  1. Escaped hallucination (before)

    • Prompt:
    • Baseline answer:
    • Why unsafe:
  2. Correct refusal (after)

    • Prompt:
    • StillMe decision/reason:
    • Why correct:
  3. False refusal (after, if any)

    • Prompt:
    • Reason code:
    • Fix candidate:
  4. Citation quality improvement example

    • Prompt:
    • Before citation state:
    • After citation state:

8) Go / No-Go Rule

Recommended internal gate for moving from monitor to warn:

  • Hallucination escape rate reduced by at least 50% vs before
  • Refusal precision >= 0.85
  • Source coverage >= 0.80 on factual subset

Recommended gate for moving from warn to enforce:

  • Metrics stable for 2 consecutive runs
  • No critical incident in sampled production traffic

9) Risk and Limitation Notes

  • Known blind spots:
  • Data quality issues:
  • Retrieval mismatch observations:
  • Policy thresholds that need tuning:

Always include limitations. Do not claim universal safety guarantees.


10) Short Public Summary (Optional)

Use this 4-line format for release notes:

  1. What was tested (scope and sample size)
  2. What improved (with numbers)
  3. What did not improve yet
  4. Next action for the next iteration