Confidence Validation & Fallback Handler

Overview

StillMe implements a comprehensive confidence validation system that prevents hallucinations by:

Detecting when AI should express uncertainty (ConfidenceValidator)
Providing safe fallback answers when validation fails (FallbackHandler)
Calculating confidence scores based on context quality and validation results
Tracking metrics for A/B testing and hallucination reduction measurement

Components

1. ConfidenceValidator

Location: backend/validators/confidence.py

Purpose: Detects when AI should express uncertainty, especially when no context is available.

Key Features:

Requires uncertainty expressions when no context is found
Detects overconfidence patterns
Supports multiple languages (English, Vietnamese, Chinese)
Case-insensitive pattern matching

Configuration:

ConfidenceValidator(require_uncertainty_when_no_context=True)

Validation Rules:

✅ PASS: AI expresses uncertainty when no context ("I don't know", "Không biết", etc.)
❌ FAIL: AI is overconfident without context ("Definitely X", "Chắc chắn 100%", etc.)
✅ PASS: AI is confident when context exists (uncertainty optional)

Uncertainty Patterns Detected:

English: "I don't know", "I'm not certain", "I cannot answer", etc.
Vietnamese: "Không biết", "Không có đủ thông tin", "Không thể trả lời", etc.
Chinese: "不知道", "不确定", etc.

2. FallbackHandler

Location: backend/validators/fallback_handler.py

Purpose: Provides safe, informative fallback answers when validation fails critically.

Key Features:

Multi-language support (English, Vietnamese, Chinese)
Context-aware fallback generation
Prevents hallucinated content from reaching users
Explains StillMe's learning mechanism

Usage:

fallback_handler = FallbackHandler()
safe_answer = fallback_handler.get_fallback_answer(
    original_answer=hallucinated_response,
    validation_result=failed_validation,
    ctx_docs=[],
    user_question="User's question",
    detected_lang="en"
)

Fallback Triggers:

missing_uncertainty_no_context - AI didn't express uncertainty when no context
missing_citation + no context - Missing citation when no context available
low_overlap + no context - Low overlap when no context available

Fallback Content:

Acknowledges lack of information
Explains StillMe's continuous learning (every 4 hours)
Offers alternatives (reformulate, wait, related topics)
Does NOT contain original hallucinated content

3. Confidence Score Calculation

Location: backend/api/routers/chat_router.py::_calculate_confidence_score()

Purpose: Calculates AI confidence in the answer based on multiple factors.

Calculation Logic:

Base Confidence:
- 0 context docs: 0.2 (very low)
- 1 context doc: 0.5 (medium)
- 2+ context docs: 0.8 (high)

Adjustments:
- Validation passed: +0.1 (max 1.0)
- Missing uncertainty (no context): 0.1 (very low)
- Missing citation (with context): -0.2
- Low overlap: -0.15
- Other failures: -0.1

Score Ranges:

0.0 - 0.3: Very uncertain (red indicator)
0.4 - 0.6: Medium confidence (orange indicator)
0.7 - 1.0: High confidence (green indicator)

4. Integration with ValidatorChain

Location: backend/api/routers/chat_router.py

Integration:

chain = ValidatorChain([
    CitationRequired(),
    EvidenceOverlap(threshold=0.01),
    NumericUnitsBasic(),
    ConfidenceValidator(require_uncertainty_when_no_context=True),  # NEW
    EthicsAdapter(guard_callable=None)
])

Flow:

ValidatorChain runs all validators sequentially
If ConfidenceValidator fails → triggers FallbackHandler
FallbackHandler generates safe answer
Confidence score calculated based on results
Metrics recorded for A/B testing

5. Metrics Tracking

Location: backend/validators/metrics.py

New Metrics:

avg_confidence_score: Average confidence across all responses
fallback_usage_count: Number of times fallback was used
fallback_usage_rate: Percentage of responses using fallback
hallucination_prevented_count: Number of hallucinations prevented
hallucination_reduction_rate: Percentage of prevented hallucinations
uncertainty_expressed_count: Number of times AI expressed uncertainty
uncertainty_expression_rate: Percentage of uncertainty expressions

A/B Testing: These metrics enable measurement of:

Hallucination reduction effectiveness
Confidence score distribution
Fallback usage patterns
Uncertainty expression patterns

API Response Changes

ChatResponse Model

New Fields:

class ChatResponse(BaseModel):
    response: str
    confidence_score: Optional[float]  # 0.0 - 1.0
    validation_info: Optional[Dict[str, Any]]  # Validation details
    learning_suggestions: Optional[List[str]]  # Knowledge gap suggestions
    # ... existing fields

validation_info Structure:

{
    "passed": bool,
    "reasons": List[str],
    "used_fallback": bool,
    "confidence_score": float,
    "context_docs_count": int
}

Dashboard Display

Location: dashboard.py

Features:

Expandable "Response Metadata" section for each assistant message
Color-coded confidence score (🟢 green / 🟡 orange / 🔴 red)
Validation status display
Fallback usage indicator
Learning suggestions display

UI Elements:

Confidence Score: Visual indicator with percentage
Validation Status: Pass/Fail with reasons
Fallback Indicator: Shows when safe fallback was used
Learning Suggestions: Topics to learn based on knowledge gaps

Testing

Test Files:

tests/test_confidence_validator.py - ConfidenceValidator tests
tests/test_fallback_handler.py - FallbackHandler tests
tests/test_confidence_integration.py - Integration tests

Test Philosophy:

Strict and honest - No cheating with type ignores or comments
Edge case coverage - Empty strings, special characters, long inputs
Multi-language support - English, Vietnamese, Chinese
Integration testing - Full ValidatorChain integration

Configuration

Environment Variables:

ENABLE_VALIDATORS=true  # Enable validator chain (includes ConfidenceValidator)
VALIDATOR_EVIDENCE_THRESHOLD=0.01  # Evidence overlap threshold

Code Configuration:

# In chat_router.py
ConfidenceValidator(require_uncertainty_when_no_context=True)

Best Practices

Always enable validators in production - ENABLE_VALIDATORS=true
Monitor metrics - Track hallucination reduction rate
Review fallback usage - High fallback rate may indicate knowledge gaps
Adjust confidence thresholds - Based on your use case requirements
Test edge cases - Empty context, special characters, long inputs

Troubleshooting

Issue: High fallback usage rate

Cause: Knowledge gaps in RAG system
Solution: Improve knowledge base coverage, adjust learning sources

Issue: Low confidence scores

Cause: Insufficient context or validation failures
Solution: Improve RAG retrieval, check validation reasons

Issue: False positives (valid answers rejected)

Cause: Overly strict validation
Solution: Adjust thresholds, review validation reasons

Future Enhancements

Confidence score calibration based on user feedback
Adaptive thresholds based on question type
Multi-level fallback strategies
Confidence score explanation to users
A/B testing framework for threshold optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence Validation & Fallback Handler

Overview

Components

1. ConfidenceValidator

2. FallbackHandler

3. Confidence Score Calculation

4. Integration with ValidatorChain

5. Metrics Tracking

API Response Changes

ChatResponse Model

Dashboard Display

Testing

Configuration

Best Practices

Troubleshooting

Future Enhancements

FilesExpand file tree

CONFIDENCE_AND_FALLBACK.md

Latest commit

History

CONFIDENCE_AND_FALLBACK.md

File metadata and controls

Confidence Validation & Fallback Handler

Overview

Components

1. ConfidenceValidator

2. FallbackHandler

3. Confidence Score Calculation

4. Integration with ValidatorChain

5. Metrics Tracking

API Response Changes

ChatResponse Model

Dashboard Display

Testing

Configuration

Best Practices

Troubleshooting

Future Enhancements