⚡ Token Optimization Mastery Skill

Complete Guide for Maximum AI Efficiency

Validated: 2026-04-26 | Tested on: Ghost in the Archive project (Python files)

🎯 Purpose

Combine aggressive token optimization with strict behavioral enforcement to achieve efficient AND predictable agents:

10x faster responses (less context to process)
Lower cost per session
Zero behavioral drift via Caliber-inspired scope and schema constraints
Prevent context window overflow on long projects
Never lose critical instructions due to memory saturation

🔬 Live Test Results (Measured 2026-04-26)

Test 1: Code Reading — Full File vs. Smart Navigation

Method	Tool Used	Tokens Consumed	Result Quality
❌ Wasteful	`view_file` (full read)	~2,500–3,000 tokens	All 189 lines returned
✅ Efficient	`grep_search` (function names only)	~150 tokens	Only function signatures
Savings		~95% reduction	Same actionable info

Concrete Example:

generate_figures.py = 189 lines, 7,778 bytes
Reading it fully = ~2,800 tokens consumed in one call
grep_search for def → returned 4 function names + line numbers in ~150 tokens
Saving: ~2,650 tokens per file access

Test 2: Code Editing — Full Rewrite vs. Targeted Replace

Method	Tool Used	Output Tokens	Risk
❌ Wasteful	Rewrite entire file	~3,000 output tokens	High (can introduce bugs)
✅ Efficient	`multi_replace_file_content`	~50 output tokens	Near-zero (surgical)
Savings		~98% reduction

Rule: Never regenerate an entire file for a minor change. Always target the exact TargetContent line range.

Test 3: Memory Retrieval — Bulk Fetch vs. 3-Layer Workflow

Method	Approach	Tokens Consumed
❌ Wasteful	Fetch all observations at once	~40,000 tokens
✅ Layer 1	`search(query)` — get IDs only	~100 tokens
✅ Layer 2	`timeline(anchor=ID)` — context	~300 tokens
✅ Layer 3	`get_observations([3-5 IDs])` — targeted	~1,500 tokens
Savings		~38,000 tokens saved (95%)

📋 The 5-Protocol System (Token Compression + Behavioral Enforcement)

Protocol A: 3-Layer MCP Memory Workflow (MANDATORY)

STEP 1 → search(query)          ← ~100 tokens, returns IDs
STEP 2 → timeline(anchor=ID)    ← ~300 tokens, returns context
STEP 3 → get_observations([IDs])← ~1,500 tokens, max 3-5 IDs only

NEVER call get_observations on all search results simultaneously.

Protocol B: Smart Codebase Navigation

❌ WRONG:  view_file("script.py")         → reads 800 lines blindly
✅ RIGHT:  grep_search("def ", file.py)   → gets function list only
✅ RIGHT:  smart_outline(file.py)         → AST folded view (if supported)
✅ RIGHT:  smart_unfold(symbol)           → one function only

Protocol C: Surgical Code Editing

❌ WRONG:  Rewrite entire file (write_to_file with Overwrite=true)
✅ RIGHT:  multi_replace_file_content with specific TargetContent
✅ RIGHT:  replace_file_content for single contiguous block only

Protocol D: Knowledge Corpora for Dense Research

For files > 500 lines or > 20,000 tokens:
  1. build_corpus(name, query)   → indexes the content
  2. prime_corpus(name)          → creates focused AI session
  3. query_corpus(name, question)→ targeted Q&A on the corpus

This avoids dumping entire documents into the active conversation.

Protocol E: Behavioral Enforcement Layer (Caliber-Inspired)

Even with 95% token savings, compressed contexts can lead to "behavioral drift," where actions stray from original intent. To counter this, layer declarative behavioral enforcement on top of token optimization:

Schema Compliance: Validate every tool call against exact schema definitions before execution.
Scope Constraints: Define clear boundaries for edits and searches. Do not wander outside the immediate task scope.
Tool Call Limits: Enforce strict caps on consecutive tool usage to prevent runaway loops.
Declarative Validation: Treat rules as an infrastructure-level proxy. Mentally validate every LLM API/tool action against task constraints before outputting. (Combining high token compression with strict runtime enforcement yields efficient AND predictable execution.)

⚠️ Anti-Pattern Blacklist

These patterns are strictly forbidden:

Forbidden Action	Why	Use Instead
`view_file` on any file > 100 lines	Wastes 1,000–5,000 tokens	`grep_search` + `smart_unfold`
`get_observations` without prior `search`	Fetches irrelevant data	Always use 3-layer workflow
`write_to_file(Overwrite=true)` for edits	Wastes all output tokens	`multi_replace_file_content`
Reading the full `.docx` output to check it	Extremely expensive	Use `word_count.py` or spot-check sections
Fetching all KI artifacts at conversation start	Floods context window	Only read KIs relevant to current task

🏆 Summary: Token Budget Per Task Type

Task Type	Budget	Primary Tools
Find a function in code	< 200 tokens	`grep_search`
Understand a file structure	< 500 tokens	`smart_outline`
Edit one section of code	< 100 tokens	`multi_replace_file_content`
Search past conversations	< 500 tokens	`search` → `timeline` → `get_observations`
Analyze a large research file	< 2,000 tokens	`build_corpus` → `query_corpus`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Token Optimization Mastery Skill

Complete Guide for Maximum AI Efficiency

🎯 Purpose

🔬 Live Test Results (Measured 2026-04-26)

Test 1: Code Reading — Full File vs. Smart Navigation

Test 2: Code Editing — Full Rewrite vs. Targeted Replace

Test 3: Memory Retrieval — Bulk Fetch vs. 3-Layer Workflow

📋 The 5-Protocol System (Token Compression + Behavioral Enforcement)

Protocol A: 3-Layer MCP Memory Workflow (MANDATORY)

Protocol B: Smart Codebase Navigation

Protocol C: Surgical Code Editing

Protocol D: Knowledge Corpora for Dense Research

Protocol E: Behavioral Enforcement Layer (Caliber-Inspired)

⚠️ Anti-Pattern Blacklist

🏆 Summary: Token Budget Per Task Type

FilesExpand file tree

Token_Optimization_EN.md

Latest commit

History

Token_Optimization_EN.md

File metadata and controls

⚡ Token Optimization Mastery Skill

Complete Guide for Maximum AI Efficiency

🎯 Purpose

🔬 Live Test Results (Measured 2026-04-26)

Test 1: Code Reading — Full File vs. Smart Navigation

Test 2: Code Editing — Full Rewrite vs. Targeted Replace

Test 3: Memory Retrieval — Bulk Fetch vs. 3-Layer Workflow

📋 The 5-Protocol System (Token Compression + Behavioral Enforcement)

Protocol A: 3-Layer MCP Memory Workflow (MANDATORY)

Protocol B: Smart Codebase Navigation

Protocol C: Surgical Code Editing

Protocol D: Knowledge Corpora for Dense Research

Protocol E: Behavioral Enforcement Layer (Caliber-Inspired)

⚠️ Anti-Pattern Blacklist

🏆 Summary: Token Budget Per Task Type