Narrative Consistency Validator

Verify character backstories against long-form literary narratives using constraint-based reasoning

Quick Start • Architecture • Documentation

Mission

Determine whether a proposed character backstory is logically consistent with events, relationships, and character development in a 100,000+ word novel—without reading the entire book manually.

The Challenge

Literary narratives establish complex constraints across hundreds of pages:

Temporal constraints: Events must occur in valid chronological order
Causal constraints: Character actions must align with stated beliefs and motivations
Factual constraints: Backstory claims must not contradict established facts
Relationship constraints: Character dynamics must remain internally consistent

The Problem: A human reader would need hours to verify a single backstory claim against an entire novel. Existing NLP approaches fail because:

LLMs cannot fit 100k+ words in context
Simple search misses implicit contradictions
QA systems retrieve facts but don't reason about consistency
Summarization loses critical details

Our Solution

A constraint-based reasoning pipeline that:

Breaks backstories into atomic, verifiable claims
Retrieves relevant evidence from the full novel using semantic search
Analyzes claim-evidence relationships using rule-based logic
Makes deterministic binary decisions: Consistent or Contradictory

Key Insight: Instead of asking an LLM "is this consistent?", we decompose the problem into structured reasoning steps with traceable logic.

Key Features

Feature	Description
Claim Decomposition	Breaks backstories into atomic claims with categories (beliefs, events, relationships) and importance levels (core, significant, surface)
Long-Context Retrieval	Handles 100k+ word novels through hierarchical chunking with overlap and timeline-aware indexing
Constraint Reasoning	Detects contradictions, support, and causal inconsistencies using pattern matching and temporal logic
Deterministic Classification	Rule-based decisions ensure reproducibility—same input always produces same output
Pathway Integration	Uses Pathway for document ingestion, indexing, and retrieval orchestration
Explainable Outputs	Every prediction includes a human-readable rationale derived from decision logic

How It Works

┌─────────────┐
│  Backstory  │  "As a child, Edmond grew up wealthy..."
└──────┬──────┘
       │
       ▼
┌─────────────────────┐
│  Claim Extraction   │  → Claim 1: "Edmond grew up wealthy" [CORE]
└──────┬──────────────┘  → Claim 2: "Edmond had formal education" [SIGNIFICANT]
       │
       ▼
┌─────────────────────┐
│ Evidence Retrieval  │  → Search novel for relevant passages
└──────┬──────────────┘  → Retrieve top-k chunks with timeline coverage
       │
       ▼
┌─────────────────────┐
│ Constraint Analysis │  → Claim 1 + Evidence → CONTRADICTS (0.9 confidence)
└──────┬──────────────┘  → "Novel states Edmond was poor sailor's son"
       │
       ▼
┌─────────────────────┐
│  Classification     │  → Fatal contradiction in CORE claim
└──────┬──────────────┘  → Decision: CONTRADICTORY (label=0)
       │
       ▼
┌─────────────────────┐
│  Output Generation  │  story_id,prediction,rationale
└─────────────────────┘  1,0,"Proposed backstory contradicts established events"

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         INPUT LAYER                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │  Novel Text  │  │  Backstories │  │  Test Data   │          │
│  │  (100k+ words)│  │     (CSV)    │  │    (CSV)     │          │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
└─────────┼──────────────────┼──────────────────┼──────────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                    PATHWAY DOCUMENT STORE                        │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌──────────┐ │
│  │   Novels   │  │   Chunks   │  │   Claims   │  │ Metadata │ │
│  │  (indexed) │  │ (embedded) │  │ (embedded) │  │(timeline)│ │
│  └────────────┘  └────────────┘  └────────────┘  └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PROCESSING PIPELINE                         │
│                                                                   │
│  ┌──────────────────┐         ┌──────────────────┐             │
│  │ Claim Extractor  │────────▶│ Evidence Retriever│             │
│  │  (LLM-assisted)  │         │ (Semantic + KW)   │             │
│  └──────────────────┘         └─────────┬─────────┘             │
│                                          │                        │
│                                          ▼                        │
│                              ┌──────────────────────┐            │
│                              │ Constraint Engine    │            │
│                              │ (Rule-based patterns)│            │
│                              └─────────┬────────────┘            │
│                                        │                          │
│                                        ▼                          │
│                              ┌──────────────────────┐            │
│                              │ Deterministic        │            │
│                              │ Classifier           │            │
│                              └─────────┬────────────┘            │
└────────────────────────────────────────┼──────────────────────────┘
                                         │
                                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                         OUTPUT LAYER                             │
│                    results.csv (submission)                      │
│  story_id | prediction | rationale                               │
│  ──────────────────────────────────────────────────────          │
│     1     |     1      | No narrative constraints violated       │
│     2     |     0      | Proposed backstory contradicts events   │
└─────────────────────────────────────────────────────────────────┘

User Journey

Step	Action	System Response
1️⃣	User provides novel text + character backstory	System loads novel into document store
2️⃣	System chunks novel into overlapping segments	5-paragraph chunks with 2-paragraph overlap, timeline tags
3️⃣	System extracts claims from backstory	"Grew up wealthy" → [EARLY_LIFE, CORE]
4️⃣	System retrieves evidence for each claim	Top-10 relevant passages with timeline coverage
5️⃣	System analyzes claim-evidence relationships	Pattern matching detects contradiction
6️⃣	System makes binary decision	Fatal contradiction → label=0
7️⃣	System generates rationale	"Proposed backstory contradicts established events"
8️⃣	User receives CSV output	story_id, prediction, rationale

Technology Stack

Core Framework

Python 3.8+ — Primary language
Pathway — Document ingestion, indexing, and retrieval orchestration

NLP & Embeddings

Sentence Transformers — Semantic embeddings (all-MiniLM-L6-v2)
scikit-learn — Cosine similarity computation

LLM Integration (Optional)

Google Gemini 2.0 Flash — Claim extraction and explanation generation only
Temperature = 0 — Deterministic outputs enforced
Note: LLMs do NOT make classification decisions

Data Processing

pandas — CSV handling
regex — Pattern matching for constraint detection

Development

python-dotenv — Environment configuration
pathlib — Cross-platform file handling

Quick Start

Prerequisites

Python 3.8+
pip (Python package manager)

Installation

Clone the repository

git clone https://github.com/Krish-afk-bot/narrative-consistency-verifier.git
cd narrative-consistency-verifier

Install dependencies

pip install -r requirements.txt

Set up environment variables (optional, for LLM features)

# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

Prepare data

# Place your novels in data/ directory
data/
  ├── The Count of Monte Cristo.txt
  ├── In search of the castaways.txt
  ├── test.csv
  └── train.csv

Running the Pipeline

python main.py

Expected Output:

============================================================
Track-A Backstory Consistency Validator
============================================================

[1/5] Initializing document store...
[2/5] Loading training data for character extraction...
[3/5] Loading and chunking novels...
[4/5] Loading test data...
[5/5] Processing test entries...

Writing results.csv...
  Output: results.csv
  Rows: 50

VALIDATION:
  test.csv rows: 50
  results.csv rows: 50
  ✓ Row count matches
  Consistent: 28
  Contradict: 22

Done.

Output Format

results.csv:

story_id,prediction,rationale
1,1,No narrative constraints violated
2,0,Proposed backstory contradicts established events
3,1,Backstory aligns with established narrative events

Project Structure

narrative-consistency-validator/
├── backstory_validator/
│   ├── config.py                    # Configuration and enums
│   ├── extraction/
│   │   └── claim_extractor.py       # Backstory → claims
│   ├── retrieval/
│   │   └── evidence_retriever.py    # Claim → evidence passages
│   ├── reasoning/
│   │   ├── constraint_engine.py     # Claim-evidence analysis
│   │   └── classifier.py            # Final decision logic
│   ├── pipeline/
│   │   ├── ingestion.py             # Novel loading
│   │   ├── chunking.py              # Hierarchical chunking
│   │   └── pathway_tables.py        # Document store
│   ├── llm/
│   │   └── gemini_client.py         # LLM interface (optional)
│   └── models/
│       └── schemas.py               # Data models
├── data/
│   ├── test.csv                     # Test backstories
│   ├── train.csv                    # Training data
│   └── *.txt                        # Novel files
├── main.py                          # Entry point
├── requirements.txt                 # Dependencies
├── results.csv                      # Output (generated)
├── TRACK_A_SUBMISSION_REPORT.md     # Technical documentation
└── README.md                        # This file

Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests (if available)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Contribution Guidelines

Write clear commit messages
Add comments for complex logic
Update documentation for new features
Ensure code follows existing style
Test your changes before submitting

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions, issues, or collaboration opportunities:

Open an issue on GitHub
Project Link: https://github.com/Krish-afk-bot/narrative-consistency-verifier

Built with 🧠 for long-context narrative reasoning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Narrative Consistency Validator

Mission

The Challenge

Our Solution

Key Features

How It Works

System Architecture

User Journey

Technology Stack

Core Framework

NLP & Embeddings

LLM Integration (Optional)

Data Processing

Development

Quick Start

Prerequisites

Installation

Running the Pipeline

Output Format

Project Structure

Contributing

Contribution Guidelines

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
backstory_validator		backstory_validator
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
results.csv		results.csv

Folders and files

Latest commit

History

Repository files navigation

Narrative Consistency Validator

Mission

The Challenge

Our Solution

Key Features

How It Works

System Architecture

User Journey

Technology Stack

Core Framework

NLP & Embeddings

LLM Integration (Optional)

Data Processing

Development

Quick Start

Prerequisites

Installation

Running the Pipeline

Output Format

Project Structure

Contributing

Contribution Guidelines

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages