Skip to content

Latest commit

 

History

History
124 lines (97 loc) · 5.61 KB

File metadata and controls

124 lines (97 loc) · 5.61 KB

⚡ Token Optimization Mastery Skill

Complete Guide for Maximum AI Efficiency

Validated: 2026-04-26 | Tested on: Ghost in the Archive project (Python files)


🎯 Purpose

Combine aggressive token optimization with strict behavioral enforcement to achieve efficient AND predictable agents:

  • 10x faster responses (less context to process)
  • Lower cost per session
  • Zero behavioral drift via Caliber-inspired scope and schema constraints
  • Prevent context window overflow on long projects
  • Never lose critical instructions due to memory saturation

🔬 Live Test Results (Measured 2026-04-26)

Test 1: Code Reading — Full File vs. Smart Navigation

Method Tool Used Tokens Consumed Result Quality
❌ Wasteful view_file (full read) ~2,500–3,000 tokens All 189 lines returned
✅ Efficient grep_search (function names only) ~150 tokens Only function signatures
Savings ~95% reduction Same actionable info

Concrete Example:

  • generate_figures.py = 189 lines, 7,778 bytes
  • Reading it fully = ~2,800 tokens consumed in one call
  • grep_search for def → returned 4 function names + line numbers in ~150 tokens
  • Saving: ~2,650 tokens per file access

Test 2: Code Editing — Full Rewrite vs. Targeted Replace

Method Tool Used Output Tokens Risk
❌ Wasteful Rewrite entire file ~3,000 output tokens High (can introduce bugs)
✅ Efficient multi_replace_file_content ~50 output tokens Near-zero (surgical)
Savings ~98% reduction

Rule: Never regenerate an entire file for a minor change. Always target the exact TargetContent line range.


Test 3: Memory Retrieval — Bulk Fetch vs. 3-Layer Workflow

Method Approach Tokens Consumed
❌ Wasteful Fetch all observations at once ~40,000 tokens
✅ Layer 1 search(query) — get IDs only ~100 tokens
✅ Layer 2 timeline(anchor=ID) — context ~300 tokens
✅ Layer 3 get_observations([3-5 IDs]) — targeted ~1,500 tokens
Savings ~38,000 tokens saved (95%)

📋 The 5-Protocol System (Token Compression + Behavioral Enforcement)

Protocol A: 3-Layer MCP Memory Workflow (MANDATORY)

STEP 1 → search(query)          ← ~100 tokens, returns IDs
STEP 2 → timeline(anchor=ID)    ← ~300 tokens, returns context
STEP 3 → get_observations([IDs])← ~1,500 tokens, max 3-5 IDs only

NEVER call get_observations on all search results simultaneously.

Protocol B: Smart Codebase Navigation

❌ WRONG:  view_file("script.py")         → reads 800 lines blindly
✅ RIGHT:  grep_search("def ", file.py)   → gets function list only
✅ RIGHT:  smart_outline(file.py)         → AST folded view (if supported)
✅ RIGHT:  smart_unfold(symbol)           → one function only

Protocol C: Surgical Code Editing

❌ WRONG:  Rewrite entire file (write_to_file with Overwrite=true)
✅ RIGHT:  multi_replace_file_content with specific TargetContent
✅ RIGHT:  replace_file_content for single contiguous block only

Protocol D: Knowledge Corpora for Dense Research

For files > 500 lines or > 20,000 tokens:
  1. build_corpus(name, query)   → indexes the content
  2. prime_corpus(name)          → creates focused AI session
  3. query_corpus(name, question)→ targeted Q&A on the corpus

This avoids dumping entire documents into the active conversation.

Protocol E: Behavioral Enforcement Layer (Caliber-Inspired)

Even with 95% token savings, compressed contexts can lead to "behavioral drift," where actions stray from original intent. To counter this, layer declarative behavioral enforcement on top of token optimization:

  1. Schema Compliance: Validate every tool call against exact schema definitions before execution.
  2. Scope Constraints: Define clear boundaries for edits and searches. Do not wander outside the immediate task scope.
  3. Tool Call Limits: Enforce strict caps on consecutive tool usage to prevent runaway loops.
  4. Declarative Validation: Treat rules as an infrastructure-level proxy. Mentally validate every LLM API/tool action against task constraints before outputting. (Combining high token compression with strict runtime enforcement yields efficient AND predictable execution.)

⚠️ Anti-Pattern Blacklist

These patterns are strictly forbidden:

Forbidden Action Why Use Instead
view_file on any file > 100 lines Wastes 1,000–5,000 tokens grep_search + smart_unfold
get_observations without prior search Fetches irrelevant data Always use 3-layer workflow
write_to_file(Overwrite=true) for edits Wastes all output tokens multi_replace_file_content
Reading the full .docx output to check it Extremely expensive Use word_count.py or spot-check sections
Fetching all KI artifacts at conversation start Floods context window Only read KIs relevant to current task

🏆 Summary: Token Budget Per Task Type

Task Type Budget Primary Tools
Find a function in code < 200 tokens grep_search
Understand a file structure < 500 tokens smart_outline
Edit one section of code < 100 tokens multi_replace_file_content
Search past conversations < 500 tokens searchtimelineget_observations
Analyze a large research file < 2,000 tokens build_corpusquery_corpus