SPICE (Self-Play In Corpus Environments) is a reinforcement learning framework that enables StillMe to continuously improve its reasoning capabilities through adversarial self-play.
RSS Feeds → Filter → Embed → ChromaDB → RAG Query → Response
RSS Feeds → Filter → Embed → ChromaDB (Corpus)
↓
SPICE Self-Play Loop:
├─ Challenger: Generate questions from corpus
├─ Reasoner: Answer questions using RAG
├─ Self-Evaluation: Validate answers
└─ Refinement: Improve failed challenges
↓
RAG Query → Enhanced Response (with self-improved reasoning)
Purpose: Generate challenging reasoning questions from corpus
Responsibilities:
- Query ChromaDB for knowledge documents
- Generate diverse reasoning questions (ethical, mathematical, logical, factual)
- Create ethical reasoning challenges based on StillMe principles
- Score difficulty and relevance
Key Methods:
generate_challenges(): General question generation from corpusgenerate_ethical_challenges(): Focus on StillMe ethical principles
Integration Points:
- Uses
RAGRetrievalto access ChromaDB corpus - Uses
EmbeddingServicefor semantic operations - Will integrate with AI model for question generation
Purpose: Attempt to answer Challenger's questions and self-evaluate
Responsibilities:
- Receive
ChallengeQuestionfrom Challenger - Use RAG to retrieve relevant context
- Generate answer using AI model
- Self-evaluate answer accuracy against source content
- Detect hallucinations and reasoning gaps
Key Methods:
answer_challenge(): Generate answer for a challengeself_evaluate(): Evaluate answer accuracy and completeness
Integration Points:
- Uses
RAGRetrievalfor context retrieval - Uses AI model for answer generation
- Uses validation chain for quality checks
Purpose: Orchestrate self-play learning cycle
Responsibilities:
- Coordinate Challenger and Reasoner
- Run self-play cycles
- Track success/failure metrics
- Trigger refinement for failed challenges
Key Methods:
run_self_play_cycle(): Execute one complete self-play cycle_handle_failure(): Process failed challenges
Location: backend/services/learning_scheduler.py
Enhancement:
async def run_learning_cycle(self):
# Existing: Fetch RSS and add to RAG
entries = self.rss_fetcher.fetch_feeds(...)
# Add to RAG...
# NEW: Run SPICE self-play cycle
if self.spice_engine:
spice_result = await self.spice_engine.run_self_play_cycle(
corpus_query="recent knowledge",
num_challenges=5,
focus_ethical=False
)Location: backend/validators/
Enhancement:
- Use Challenger to generate ethical reasoning questions
- Integrate ethical challenges into validation metrics
- Track validation performance on SPICE-generated questions
Priority Implementation:
# Initial focus: Ethical reasoning challenges
ethical_challenges = await challenger.generate_ethical_challenges(
num_questions=3
)
# Use these for validation metricsLocation: backend/vector_db/rag_retrieval.py
No changes required - SPICE uses existing RAG infrastructure:
retrieve_context(): Reasoner uses this to get contextadd_learning_content(): Used for refinement (re-embedding)
1. Challenger.generate_challenges()
├─ Query ChromaDB corpus
├─ Generate questions using AI
└─ Return List[ChallengeQuestion]
2. For each ChallengeQuestion:
├─ Reasoner.answer_challenge()
│ ├─ Retrieve context using RAG
│ ├─ Generate answer using AI
│ └─ Self-evaluate answer
└─ ReasonerResponse
├─ If passed: Success count++
└─ If failed: Trigger refinement
3. SPICE Engine aggregates results
└─ Return cycle metrics
# SPICE Control
POST /api/spice/run-cycle
GET /api/spice/status
GET /api/spice/metrics
# Challenger
POST /api/spice/challenger/generate
POST /api/spice/challenger/ethical
# Reasoner
POST /api/spice/reasoner/answer
POST /api/spice/reasoner/evaluate- ✅ Core classes:
Challenger,Reasoner,SPICEEngine - ✅ Data structures:
ChallengeQuestion,ReasonerResponse - ✅ Integration points defined
- ⏳ API endpoints (skeleton)
- AI-powered question generation
- Difficulty scoring
- Ethical reasoning challenge generation
- Corpus query optimization
- RAG-based answer generation
- Self-evaluation logic
- Hallucination detection
- Factual accuracy checking
- Cycle orchestration
- Failure handling and refinement
- Curriculum difficulty adjustment
- Metrics tracking
- Learning scheduler integration
- Validation enhancement
- Dashboard metrics
- Performance optimization
- SPICE learns from real corpus (ChromaDB), not synthetic data
- Maintains context, realism, and semantic depth
- Enables continuous improvement without retraining
- Initial focus on ethical reasoning challenges
- Aligns with StillMe's core principles (transparency, open governance)
- Enhances validation metrics
- Reasoner evaluates its own answers
- Detects hallucinations and reasoning gaps
- Enables autonomous quality control
- Failed challenges trigger refinement
- Re-embedding with enhanced metadata
- Validation queue for manual review
- +8.9% mathematical reasoning (MATH benchmark)
- +9.8% general reasoning (reasoning benchmarks)
- Based on SPICE research paper results
- Better hallucination detection
- Improved factual accuracy
- Enhanced ethical reasoning
- No retraining required
- Self-improving system
- Adapts to new corpus content
- Risk: Self-generated challenges may reproduce biases
- Mitigation:
- Focus on ethical reasoning challenges
- Human oversight for validation
- Bias detection in self-evaluation
- Risk: Self-play cycles may be expensive
- Mitigation:
- Limit cycle frequency (e.g., once per day)
- Optimize question generation
- Cache embeddings and context
- Risk: Generated questions may be low quality
- Mitigation:
- Difficulty scoring
- Relevance filtering
- Validation checkpoints
- Complete Phase 1: Finish API endpoint skeletons
- Start Phase 2: Implement Challenger question generation
- Test Integration: Verify with existing RAG system
- Iterate: Refine based on initial results
- SPICE Paper: https://arxiv.org/abs/2510.24684
- Meta AI Research: Self-Play In Corpus Environments
- StillMe Core Principles: Transparency, Open Governance, Acknowledging Black Box Reality and Building Transparent Solutions