Validated: 2026-04-26 | Tested on: Ghost in the Archive project (Python files)
Combine aggressive token optimization with strict behavioral enforcement to achieve efficient AND predictable agents:
- 10x faster responses (less context to process)
- Lower cost per session
- Zero behavioral drift via Caliber-inspired scope and schema constraints
- Prevent context window overflow on long projects
- Never lose critical instructions due to memory saturation
| Method | Tool Used | Tokens Consumed | Result Quality |
|---|---|---|---|
| ❌ Wasteful | view_file (full read) |
~2,500–3,000 tokens | All 189 lines returned |
| ✅ Efficient | grep_search (function names only) |
~150 tokens | Only function signatures |
| Savings | ~95% reduction | Same actionable info |
Concrete Example:
generate_figures.py= 189 lines, 7,778 bytes- Reading it fully = ~2,800 tokens consumed in one call
grep_searchfordef→ returned 4 function names + line numbers in ~150 tokens- Saving: ~2,650 tokens per file access
| Method | Tool Used | Output Tokens | Risk |
|---|---|---|---|
| ❌ Wasteful | Rewrite entire file | ~3,000 output tokens | High (can introduce bugs) |
| ✅ Efficient | multi_replace_file_content |
~50 output tokens | Near-zero (surgical) |
| Savings | ~98% reduction |
Rule: Never regenerate an entire file for a minor change. Always target the exact TargetContent line range.
| Method | Approach | Tokens Consumed |
|---|---|---|
| ❌ Wasteful | Fetch all observations at once | ~40,000 tokens |
| ✅ Layer 1 | search(query) — get IDs only |
~100 tokens |
| ✅ Layer 2 | timeline(anchor=ID) — context |
~300 tokens |
| ✅ Layer 3 | get_observations([3-5 IDs]) — targeted |
~1,500 tokens |
| Savings | ~38,000 tokens saved (95%) |
STEP 1 → search(query) ← ~100 tokens, returns IDs
STEP 2 → timeline(anchor=ID) ← ~300 tokens, returns context
STEP 3 → get_observations([IDs])← ~1,500 tokens, max 3-5 IDs only
NEVER call get_observations on all search results simultaneously.
❌ WRONG: view_file("script.py") → reads 800 lines blindly
✅ RIGHT: grep_search("def ", file.py) → gets function list only
✅ RIGHT: smart_outline(file.py) → AST folded view (if supported)
✅ RIGHT: smart_unfold(symbol) → one function only
❌ WRONG: Rewrite entire file (write_to_file with Overwrite=true)
✅ RIGHT: multi_replace_file_content with specific TargetContent
✅ RIGHT: replace_file_content for single contiguous block only
For files > 500 lines or > 20,000 tokens:
1. build_corpus(name, query) → indexes the content
2. prime_corpus(name) → creates focused AI session
3. query_corpus(name, question)→ targeted Q&A on the corpus
This avoids dumping entire documents into the active conversation.
Even with 95% token savings, compressed contexts can lead to "behavioral drift," where actions stray from original intent. To counter this, layer declarative behavioral enforcement on top of token optimization:
- Schema Compliance: Validate every tool call against exact schema definitions before execution.
- Scope Constraints: Define clear boundaries for edits and searches. Do not wander outside the immediate task scope.
- Tool Call Limits: Enforce strict caps on consecutive tool usage to prevent runaway loops.
- Declarative Validation: Treat rules as an infrastructure-level proxy. Mentally validate every LLM API/tool action against task constraints before outputting. (Combining high token compression with strict runtime enforcement yields efficient AND predictable execution.)
These patterns are strictly forbidden:
| Forbidden Action | Why | Use Instead |
|---|---|---|
view_file on any file > 100 lines |
Wastes 1,000–5,000 tokens | grep_search + smart_unfold |
get_observations without prior search |
Fetches irrelevant data | Always use 3-layer workflow |
write_to_file(Overwrite=true) for edits |
Wastes all output tokens | multi_replace_file_content |
Reading the full .docx output to check it |
Extremely expensive | Use word_count.py or spot-check sections |
| Fetching all KI artifacts at conversation start | Floods context window | Only read KIs relevant to current task |
| Task Type | Budget | Primary Tools |
|---|---|---|
| Find a function in code | < 200 tokens | grep_search |
| Understand a file structure | < 500 tokens | smart_outline |
| Edit one section of code | < 100 tokens | multi_replace_file_content |
| Search past conversations | < 500 tokens | search → timeline → get_observations |
| Analyze a large research file | < 2,000 tokens | build_corpus → query_corpus |