Qualia Tongue

Deterministic symbolic algebra prevents mode collapse in LLM-generated synthetic training data.

That's the thesis. Here's the proof.

1,410 outputs. 705 mathematically constrained through Tau-Tongue symbolic algebra (color, by archetype). 705 unconstrained controls (grey ✕). Same source abstract. Same model. Same temperature.

The grey cluster is tight. That's the LLM doing what LLMs do — converging on the same "different angle" over and over. Mode collapse, visualized.

The colored cloud is dispersed. That's Tau-Tongue's archetypal matrix forcing the model into genuinely different semantic territory for every braid. Each color is a different philosophical archetype. The separation between clusters — and the spread within the Tau-Tongue cluster — is the entire point.

The fence works.

The Numbers

Metric	Value
Source abstract	Integrated Information Theory (IIT)
Model	`qwen3:8b` (Ollama)
Embedding model	`bge-m3:567m` (1024D)
Total braids	705
Total outputs	1,410 (705 TT-guided + 705 control)

Semantic Diversity

Measurement	TT-Guided	Control
Intra-cluster variance	0.156	0.087
Variance gap	79.5% greater than control	—

The control group's internal variance (0.087) is tight. Left to its own devices, the LLM writes the same rewrite 705 different ways. The Tau-Tongue group (0.156) is nearly twice as spread out. That's not a marginal improvement. That's a different distribution.

Inter-Cluster Distance (TT vs Paired Control)

Metric	Value
Mean	0.218
Std Dev	0.059
Min	0.083
Max	0.354
Median	0.218

Only 4 of 705 pairs (0.6%) had cosine distance below 0.1. 68 pairs (9.6%) exceeded 0.3. The engine doesn't just nudge the outputs — it moves them.

Per-Archetype Performance

Every archetype in the Qualia config contributes meaningful divergence. No archetype collapses.

Archetype	Braids	Mean Distance	Range
Phi	10	0.264	0.171–0.315
The Definer	71	0.234	0.098–0.351
The Unifier	98	0.224	0.083–0.339
The Functionalist	82	0.222	0.107–0.347
The Panpsychist	74	0.220	0.115–0.326
The Integrator	82	0.217	0.105–0.335
The Phenomenologist	77	0.214	0.090–0.348
The Mysterion	67	0.212	0.114–0.354
The Eliminativist	79	0.210	0.091–0.345
Qualia	43	0.205	0.106–0.339
The Distinctor	22	0.204	0.115–0.317

Verdict

PASS. TT intra-cluster variance (0.156) is 79.5% greater than control (0.087), with significant inter-cluster distance (0.218). The hypothesis holds.

Read the full writeup: MAD Science: How to Cure Mode Collapse (MAD) with Simple Math on Astral Architecture.

How It Works

Input — A dense academic abstract (IIT, in this case) is fed into the Tau-Tongue interpreter with a domain-specific archetype configuration (QUALIA_CONFIG).
Fracture — Tau-Tongue decomposes the input into 705 symbolic "braids" — each one a unique equation encoding a philosophical lens, pressure density, and an archetypal presence matrix.
Constrain — Each braid's equation maps to a specific archetype (The Eliminativist, The Phenomenologist, Phi, etc.) at a specific pressure. This becomes the mathematical cage the LLM must operate within.
Generate — For each braid, two outputs are produced:
- TT-guided: The LLM rewrites the abstract through the braid's archetypal lens, pressure, and matrix constraints
- Control: The same LLM rewrites the same abstract with no constraints — just "pick your own angle"
Embed — Both outputs are embedded into 1024-dimensional space via bge-m3.
Measure — Cosine distance between each TT/control pair (inter-cluster) and pairwise distances within each group (intra-cluster variance).
Visualize — UMAP dimensionality reduction projects all 1,410 embeddings into 2D for the scatter plot above.

The archetypal matrix is the fence. The math determines the archetype, the pressure determines the intensity, and the LLM does what it does best — generates text — but within the bounds of the constraint. The result is synthetic data with genuine perspective diversity instead of 705 flavors of the same take.

Architecture

This is an npm workspaces monorepo with three packages:

packages/
├── runner/     # CLI experiment runner (TypeScript ESM)
├── api/        # Express REST API for experiment data (TypeScript ESM)
└── frontend/   # React + Vite analysis dashboard (TypeScript)

Runner — Orchestrates the full pipeline: Tau-Tongue interpretation, Ollama generation + embedding, SQLite persistence, JSONL corpus output, UMAP chart generation, and summary report.

API — Serves experiment data from SQLite over REST. Includes a server-side UMAP endpoint with seeded PRNG for deterministic layouts.

Frontend — Interactive analysis dashboard. UMAP scatter chart with archetype color coding, side-by-side braid comparison, markdown rendering, screenshot export.

Run Your Own Experiment

Prerequisites

Node.js v22+
Ollama running locally or on your network
Pull the models: ollama pull qwen3:8b and ollama pull bge-m3:567m

Setup

git clone https://github.com/astralarkitekt/qualia-tongue.git
cd qualia-tongue
npm install
npm run build

Configure

cp packages/runner/.env.example packages/runner/.env

Edit packages/runner/.env:

OLLAMA_HOST=http://localhost:11434/api   # Your Ollama endpoint
OLLAMA_MODEL=qwen3:8b                    # Or any model Ollama serves
OLLAMA_EMBED_MODEL=bge-m3:567m           # Embedding model
ABSTRACT_PATH=abstracts/iit-corpus-abstract.md
OUTPUT_ROOT=experiments
BRAID_LIMIT=0                            # 0 = all braids, N = first N for testing
DRY_RUN=false                            # true = skip LLM calls, log prompts only

Run

# Full experiment (705 braids — takes hours depending on hardware)
npm run runner

# Test run (first 5 braids)
npm run runner -- --braid-limit 5

# Dry run (log prompts, skip generation)
npm run runner -- --dry-run

View Results

# Start the API
npm run api

# Start the frontend (in another terminal)
npm run frontend

Open http://localhost:5173. Select your experiment and run from the dropdowns.

The Corpus

The runner produces a chat-format JSONL corpus at experiments/<id>/corpus/corpus.jsonl. Each entry follows the modern fine-tuning format:

{
  "messages": [
    { "role": "system", "content": "<Qualia-Tongue Interpreter Primer>" },
    { "role": "user", "content": "<LENS + pressure + archetypal matrix + abstract>" },
    { "role": "assistant", "content": "<rewritten abstract>" }
  ],
  "metadata": {
    "braidIndex": 42,
    "microCrucible": "The Eliminativist",
    "pressureDensity": 0.71,
    "dominantArchetype": "The Eliminativist",
    "cosineDistance": 0.284,
    "type": "tt-guided"
  }
}

This is ready for supervised fine-tuning with OpenAI, Axolotl, LLaMA-Factory, or any pipeline that expects messages with system/user/assistant roles.

The Source Abstract

The experiment uses a single, dense IIT abstract as its input — the same text processed 705 times through different symbolic lenses. You can find it at abstracts/iit-corpus-abstract.md.

You can swap in your own abstract on any topic. The Tau-Tongue engine doesn't care what the content is — it cares about the math. Define your own QUALIA_CONFIG with domain-relevant archetypes, point it at your text, and let it run.

Note: the body and length of your input determines the number of variants Tau-Tongue can generate from your source material. A denser, longer abstract produces more braids. The IIT abstract used here yielded 705.

Customize Your Archetypes

The archetypes in this experiment are tuned for consciousness research (The Phenomenologist, The Eliminativist, Phi, etc.). Your domain will have its own perspectives.

Building AI safety training data? Your archetypes might be The Alignment Researcher, The Capabilities Hawk, The Governance Advocate. Medical literature? The Clinician, The Researcher, The Ethicist, The Patient Advocate.

See packages/runner/src/qualia-config.ts for the consciousness-research config used in this experiment as a starting point.

The archetypal matrix is your fence. You define the perspectives, Tau-Tongue does the math, and the LLM stays inside the yard.

License

MIT

Remember, you can read the full writeup: MAD Science: How to Cure Mode Collapse (MAD) with Simple Math on Astral Architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
abstracts		abstracts
docs		docs
experiments/1-The-Phenomenologist-2026-03-14		experiments/1-The-Phenomenologist-2026-03-14
packages		packages
scripts		scripts
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qualia Tongue

The Numbers

Semantic Diversity

Inter-Cluster Distance (TT vs Paired Control)

Per-Archetype Performance

Verdict

How It Works

Architecture

Run Your Own Experiment

Prerequisites

Setup

Configure

Run

View Results

The Corpus

The Source Abstract

Customize Your Archetypes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qualia Tongue

The Numbers

Semantic Diversity

Inter-Cluster Distance (TT vs Paired Control)

Per-Archetype Performance

Verdict

How It Works

Architecture

Run Your Own Experiment

Prerequisites

Setup

Configure

Run

View Results

The Corpus

The Source Abstract

Customize Your Archetypes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages