The StillMe Core validation system ensures response quality, reduces hallucinations, and enforces transparency. It consists of a ValidationEngine that orchestrates a 19-validator framework organized into 7 layers.
- Transparency First: All validation decisions are logged and explainable
- Mandatory Validation: Every response must pass validation
- Intellectual Humility: Honest about limitations and uncertainties
- Modular Design: Each validator is independent and can be enabled/disabled
The ValidationEngine orchestrates multiple validators:
from stillme_core.validation import ValidationEngine, CitationRequired, EvidenceOverlap
# Create engine with validators
engine = ValidationEngine([
CitationRequired(),
EvidenceOverlap(),
# ... more validators
])
# Validate a response
result = engine.validate(
question="What is AI?",
answer="AI is artificial intelligence...",
context_docs=["doc1", "doc2"],
# ... other parameters
)All validators implement the Validator protocol:
from stillme_core.validation import Validator, ValidationResult
class MyValidator(Validator):
def run(self, answer: str, ctx_docs: List[str]) -> ValidationResult:
# Validation logic
if self._check_passes(answer):
return ValidationResult(
passed=True,
reasons=[]
)
else:
return ValidationResult(
passed=False,
reasons=["Validation failed: reason"]
)Ensures citations are present when required.
When to use: Always (critical validator)
What it checks:
- Presence of citations in answer
- Citation format correctness
Validates that citations are relevant to the answer.
When to use: When citations are present
What it checks:
- Citation relevance to answer content
- Citation context matching
Checks evidence overlap between answer and context documents.
When to use: Always (critical validator)
What it checks:
- Overlap score between answer and context
- Evidence coverage
Validates confidence scores.
When to use: Always (critical validator)
What it checks:
- Confidence score appropriateness
- Confidence calibration
Validates language consistency.
When to use: Always (must run first)
What it checks:
- Language consistency throughout answer
- Language matching with question
Checks StillMe identity consistency.
When to use: Always
What it checks:
- Identity markers presence
- Identity consistency
Ensures ego-neutral responses.
When to use: Always
What it checks:
- Absence of ego markers
- Neutral tone
Validates source consensus.
When to use: When multiple sources are available
What it checks:
- Source agreement
- Consensus strength
Checks internal consistency.
When to use: Always
What it checks:
- Claim consistency
- Logical coherence
Validates philosophical depth.
When to use: For philosophical questions
What it checks:
- Depth of analysis
- Philosophical rigor
Validates numeric units.
When to use: When answer contains numbers
What it checks:
- Unit correctness
- Number formatting
Validates schema format.
When to use: When structured output is required
What it checks:
- Schema compliance
- Format correctness
Detects and validates step-by-step reasoning.
When to use: For multi-step questions
What it checks:
- Step detection
- Step validation
Self-corrects experience claims.
When to use: When answer contains experience claims
What it checks:
- Experience claim validity
- Auto-correction
Validates ethical considerations.
When to use: Always
What it checks:
- Ethical compliance
- Ethical reasoning
Simulated peer review evaluation.
When to use: Optional (can be enabled)
What it checks:
- Peer review criteria
- Quality assessment
Handles validation failures gracefully.
When to use: Always (last validator)
What it checks:
- Fallback generation
- Error handling
Validators can run in two modes:
- Sequential: Validators that must run in order (dependencies)
- Parallel: Validators that can run concurrently (independent)
Sequential Validators:
LanguageValidator(must run first)CitationRequired(must run beforeCitationRelevance)ConfidenceValidator(may depend on other results)
Parallel Validators:
CitationRelevanceEvidenceOverlapNumericUnitsBasicEthicsAdapterEgoNeutralityValidatorSourceConsensusValidator
The engine supports early exit for critical failures:
# If a critical validator fails, stop immediately
if result.passed == False and validator.is_critical:
return result # Early exitclass ValidationResult:
passed: bool # Whether validation passed
reasons: List[str] # Failure reasons or warnings
patched_answer: Optional[str] # Auto-patched answerfrom stillme_core.validation import Validator, ValidationResult
class MyValidator(Validator):
def run(self, answer: str, ctx_docs: List[str]) -> ValidationResult:
# Your validation logic
if self._check(answer):
return ValidationResult(
passed=True,
reasons=[]
)
else:
return ValidationResult(
passed=False,
reasons=["My validation failed"]
)from stillme_core.validation import ValidationEngine
engine = ValidationEngine([...])
engine.add_validator(MyValidator())If your validator has dependencies, ensure it runs after them:
# Validators run in registration order
engine = ValidationEngine([
LanguageValidator(), # Runs first
MyValidator(), # Runs after LanguageValidator
])Each validator should check one specific aspect.
Always provide clear, actionable failure reasons.
If possible, provide patched_answer to auto-fix issues.
All validation decisions should be logged for transparency.
Validators should handle edge cases gracefully (empty answers, no context, etc.).
Validation metrics are automatically recorded to UnifiedMetricsCollector:
- Total validations
- Pass/fail counts
- Failure reasons
- Confidence scores
- Overlap scores
- Fallback usage
# Retrieve context
context = rag.retrieve_context(query)
# Validate with context
result = engine.validate(
question=query,
answer=response,
context_docs=context["knowledge_docs"]
)# Validate first
validation_result = engine.validate(...)
# Post-process if validation passed
if validation_result.passed:
processed = post_processor.process(response)