All notable changes to NadirClaw will be documented in this file.
- Application Default Credentials (ADC) for Gemini — when no
GOOGLE_API_KEYis set, the Gemini path now falls back togoogle.auth.default()so users can authenticate viaGOOGLE_CLOUD_PROJECT/GOOGLE_CLOUD_LOCATION(Vertex AI / gcloud-managed creds) instead of pasting a key. Original work by @froody (#57). nadirclaw statusdisplays the mid-tier model when one is configured, alongside simple/complex (#57).
- Gemini streaming was broken —
_dispatch_model_streamconsumed_stream_gemini(an async generator) with a plainforloop, which would raiseTypeError: 'async_generator' object is not iterableon any actual streaming Gemini call. Now usesasync for, and chunk /finish_reasonparsing is robust to the google-genai SDK returning enum-like objects (#57). savingsno longer crashes onNonevalues in the request log forselected_modelandtier— these show up for failed / aborted requests and previously broke the report (#57).
- Configurable embedding backends for the centroid classifier —
NADIRCLAW_EMBEDDING_BACKEND(defaultsentence-transformers; alsoollamavia/api/embed),NADIRCLAW_EMBEDDING_MODEL,NADIRCLAW_EMBEDDING_API_BASE, andNADIRCLAW_CENTROID_DIR. Custom centroid directories require acentroid_metadata.json(schema-versioned, withprototypes_hashfor traceability) so users never silently mismatch a self-built centroid against a different encoder.nadirclaw build-centroidsgains--backend,--model,--api-base,--output-dirflags. Original work by @clawSean (#50). - Optional prompt-injection guard —
nadirclaw/prompt_guard.py. Heuristic detection of 7 patterns (instruction override, role reassignment, prompt extraction, JSON role confusion, delimiter injection, encoded payloads, DAN/jailbreak).NADIRCLAW_PROMPT_GUARD:log(default) /warn/block. Scans only user/tool messages — system/assistant treated as trusted. Original work by @pradumna-gautam (#55, supersedes #31). - Optional PII redactor —
nadirclaw/pii_redactor.py. Detects email, US phone, SSN, and Luhn-validated credit-card numbers.NADIRCLAW_PII_REDACTION:none(default) /log_only/redact. Non-streaming responses only. Original work by @pradumna-gautam (#55).
- Production hardening baseline — recommended for anyone exposing
nadirclaw servebeyond localhost. Original work by @pradumna-gautam (#30).- CORS: explicit allowlist via
NADIRCLAW_CORS_ORIGINS; localhost regex default; never wildcard + credentials. - Auth: constant-time token comparison via
hmac.compare_digestto defeat timing-side-channel guessing. - Security headers on every response:
X-Content-Type-Options: nosniff,X-Frame-Options: DENY,Referrer-Policy: strict-origin-when-cross-origin,Cache-Control: no-storeon/v1/*, opt-in HSTS viaNADIRCLAW_HSTS=true. - Bounds validation on
ChatCompletionRequest: caps on messages (500),max_tokens(100K),temperature(0–2),top_p(0–1),n(1–8) — closes a cost-amplification surface. - Sanitized validation errors — Pydantic internals no longer leak to clients; full details still server-side logged.
- Async logging — SQLite writes moved off the event loop into a
ThreadPoolExecutor, withdone_callbackexception logging andshutdown(wait=True)on SIGTERM so queued entries drain instead of dropping. - Prompt truncation — 500-char default in SQLite request logs (configurable via
NADIRCLAW_LOG_PROMPT_TRUNCATE); API-key shaped tokens (sk-…,AIza…,ghp_…,gho_…,xox[bpars]-…) redacted from logged system prompts.
- CORS: explicit allowlist via
- Anthropic-compatible
/v1/messagesendpoint — Anthropic-native clients (Claude Code) now route through NadirClaw. The proxy classifies, rewrites themodelfield, forwards toapi.anthropic.com, and pipes SSE streaming through byte-for-byte (#51). - Seamless Claude Code integration —
nadirclaw claude onboard/shim/uninstall. Onboarding detects models, maps them into tiers, persistsANTHROPIC_BASE_URL+ANTHROPIC_MODELinto~/.claude/settings.json, and installs a launchd / systemd auto-start unit (#51). - Live model detection — onboarding queries Anthropic's
/v1/modelsusing the stored token (Bearer for subscription tokens,x-api-keyfor API keys) instead of a hardcoded list;--interactivelets you pick a model per tier (#51). - Pluggable complexity classifier —
NADIRCLAW_COMPLEXITY_ANALYZER=binary(default, ~10ms centroid) ordistilbert(3-class fine-tuned DistilBERT predicting simple/mid/complex natively). The DistilBERT artifact downloads from the Hugging Face Hub on first use with a graceful fallback to binary (#51, #52). - Pro upsell surfaces —
nadirclaw savings/serve/reportand the README now surface Nadir Pro at high-intent moments with attribution-tagged URLs; newdemo/cost_vs_opus.pyzero-API-key demo (#53). - Enriched
/v1/models— responses now include Anthropic-styletype/display_name/description/created_atalongside the OpenAI-style fields.
ANTHROPIC_BASE_URLis written as the bare host (Claude Code appends/v1/messagesitself; a/v1suffix produced a broken/v1/v1/messagespath) (#51).- Updated the stale Claude model fallback list from the 4.5/4.1 generation to the 4.6 family (#51).
nadirclaw update-modelscommand — writes refreshable model metadata to~/.nadirclaw/models.json, optionally merging a published registry JSON via--source-urlorNADIRCLAW_MODEL_REGISTRY_URL.- Local model metadata overrides — the router now merges
~/.nadirclaw/models.jsonand user-managed~/.nadirclaw/models.local.jsoninto the runtime model registry. - DeepSeek V4 explicit aliases — added
deepseek-v4,deepseek-v4-flash, anddeepseek-v4-prowhile preserving the existingdeepseekalias fordeepseek/deepseek-chat. - Model pool weighted load balancing — pool tier configuration with weighted round-robin across multiple models in the same tier (#36).
- Selective context compression module — opt-in compression for tool-heavy contexts (#40).
- Complex coding detection and enhanced reasoning markers — improved tier classification for coding-heavy prompts and Chinese reasoning markers (#38).
- Upgrade-only session cache for agent frameworks — caches routing decisions per session to avoid repeated downgrades on multi-turn agent flows (#27).
- Agent role detection for AI coding assistants — recognizes Claude Code / Cursor-style system prompts and routes accordingly (#37/#45).
- Fallback reasons logging — failed fallback attempts now record ordered per-model
fallback_reasonswith compact error types and sanitized messages (#47). - Provider health-aware fallback routing — optional
NADIRCLAW_PROVIDER_HEALTH=truemode tracks in-process model health and tries healthy fallback candidates before cooling-down ones; debug snapshot via/internal/provider_health(#48).
- Thinking/reasoning token passthrough — transparently forwards thinking parameters and extracts reasoning content from all provider paths:
- Request forwarding:
reasoning_effort(OpenAI o-series),thinking(Anthropic extended thinking),thinking_config(Gemini), andresponse_formatare now passed through to LiteLLM, Anthropic OAuth, and Gemini native paths. - Response extraction:
reasoning_content(DeepSeek),thinkingblocks (Anthropic), andthoughtparts (Gemini) are captured from LLM responses and included inchoices[].message. - Usage reporting:
completion_tokens_details.reasoning_tokenssurfaced when providers report thinking token counts. - Works in both streaming (real SSE and fake/cached SSE) and non-streaming response formats.
- Request forwarding:
- 15 new tests covering thinking parameter forwarding, response extraction, JSON serialization safety, and streaming passthrough.
- Context Optimize — new preprocessing stage that compacts bloated context before LLM dispatch, reducing input token cost by 30-70%. Two modes:
safe— five deterministic, lossless transforms: JSON minification, whitespace normalization, system prompt dedup, tool schema dedup, chat history trimming.aggressive— all safe transforms + diff-preserving semantic deduplication. Uses sentence embeddings (all-MiniLM-L6-v2) to detect near-duplicate messages (cosine similarity >= 0.85), then extracts only the unique diff phrases usingdifflib.SequenceMatcher. Refinements survive dedup — "return values, not indices" is preserved even when 90% similar to an earlier message.
- Accurate token counting with tiktoken — uses
cl100k_baseBPE tokenizer instead oflen//4heuristic. Falls back gracefully if tiktoken is not installed. - Shared sentence encoder — lazy-loaded
SentenceTransformersingleton innadirclaw/encoder.pyfor aggressive mode. No import cost when using safe mode or off. nadirclaw optimizecommand — dry-run CLI tool to test context compaction on files or stdin. Supports--mode safe|aggressiveand--format text|json.--optimizeflag onnadirclaw serve— set optimization mode at startup (off,safe,aggressive).- Per-request
optimizeoverride — pass"optimize": "safe"in the request body to override the server default for individual requests. - Optimization metrics —
tokens_saved,original_tokens,optimized_tokens, andoptimizations_appliedlogged per request in JSONL, SQLite, and Prometheus. Web dashboard shows aggregate savings. - New env vars:
NADIRCLAW_OPTIMIZE(default:off),NADIRCLAW_OPTIMIZE_MAX_TURNS(default:40). - 60 automated tests covering safe transforms, aggressive semantic dedup, accuracy preservation, edge cases, and roundtrip integrity.
- SQLite schema: added columns
optimization_mode,original_tokens,optimized_tokens,tokens_saved,optimizations_applied(auto-migrated on startup).
nadirclaw testcommand — probes each configured model tier with a short live request and reports latency, response, and pass/fail. Exits with code 1 on failure so it works in CI. Supports--simple-model,--complex-model, and--timeoutoverrides.classify --format json— new--format text|jsonflag onnadirclaw classify. JSON output includestier,is_complex,confidence,score,model, andprompt. Composable withjq.- Multi-word prompt support for
classify—nadirclaw classify What is 2+2?now works without quoting. Previously only the first word was captured.
nadirclaw savingsnow prefers SQLite — mirrorsnadirclaw report: reads fromrequests.dbwhen available, falls back torequests.jsonl. Previously only JSONL was read, giving empty or stale results for users without a JSONL file.nadirclaw dashboardnow prefers SQLite — same fix as savings; dashboard no longer shows empty data when onlyrequests.dbexists.SessionCacheLRU eviction is now O(1) — replacedList[str]+list.remove()(O(n) per cache hit) withcollections.OrderedDict+move_to_end()/popitem(last=False), both O(1). Affectsrouting.py.ModelRateLimiter.get_statusis now thread-safe — all reads of_limits,_hits, and_default_rpmare now taken inside the lock, eliminating a potential data race under concurrent requests.
auth statusindentation — the "no credentials" help block was over-indented (12 spaces) and the provider hint strings were misaligned. Fixed to consistent 4-space indentation.- Removed redundant
load_dotenv()inserve—settings.pyalready loads~/.nadirclaw/.envat import time; the extra bareload_dotenv()call in theservecommand was a no-op that could cause confusion when debugging env resolution.
- OpenClaw onboard: register nadirclaw provider without overriding the agent's primary model
- Configurable fallback chains — when a model fails (429, 5xx, timeout), cascade through a configurable list of fallback models. Set
NADIRCLAW_FALLBACK_CHAINto customize the order. - Real-time spend tracking and budget alerts — every request's cost is tracked by model, daily, and monthly. Set
NADIRCLAW_DAILY_BUDGETandNADIRCLAW_MONTHLY_BUDGETfor alerts at configurable thresholds. Newnadirclaw budgetCLI command and/v1/budgetAPI endpoint. - Prompt caching — LRU cache for identical prompts. Configurable TTL (
NADIRCLAW_CACHE_TTL, default 5min) and max size (NADIRCLAW_CACHE_MAX_SIZE, default 1000). Newnadirclaw cacheCLI command and/v1/cacheAPI endpoint. Toggle withNADIRCLAW_CACHE_ENABLED. - Web dashboard — browser-based dashboard at
/dashboardwith auto-refresh. Shows routing distribution, per-model stats, cost tracking, budget status, and recent requests. Dark theme, zero dependencies. - Docker support — official Dockerfile and docker-compose.yml.
docker compose upgives you NadirClaw + Ollama for a fully local zero-cost setup.
- Fallback logic upgraded from simple tier-swap to full chain cascade
- Request logs now include per-request cost and daily spend
- Budget state persists across restarts via
budget_state.json
- OAuth login for all major providers: OpenAI, Anthropic, Google Gemini, Google Antigravity
- Interactive Anthropic login — choose between setup token or API key
- Gemini OAuth PKCE flow with browser-based authorization
- Antigravity OAuth with hardcoded public client credentials (matching OpenClaw)
- Provider-specific token refresh (OpenAI, Anthropic, Gemini, Antigravity)
- Atomic credential file writes to prevent corruption
- Port-in-use error handling for OAuth callback server
- Test suite with pytest (credentials, OAuth, classifier, server)
- CONTRIBUTING.md and CHANGELOG.md
- Version is now single source of truth in
nadirclaw/__init__.py - Credential file writes use atomic temp-file-and-rename pattern
- Token refresh failures return
Noneinstead of silently returning stale tokens - OAuth callback server binds to
localhost(was127.0.0.1)
- Version mismatch between
__init__.py,cli.py,server.py, andpyproject.toml - README references to
nadirclaw auth gemini-cli(nownadirclaw auth gemini) - OAuth callback server getting stuck (now uses
serve_forever())
- OpenAI OAuth login via Codex CLI
- Credential storage in
~/.nadirclaw/credentials.json - Environment variable fallback for API keys
nadirclaw authcommand group
- Initial release
- Binary complexity classifier with sentence embeddings
- Smart routing between simple and complex models
- OpenAI-compatible API (
/v1/chat/completions) - SSE streaming support
- Rate limit fallback between tiers
- Gemini native SDK integration
- LiteLLM support for 100+ providers
- CLI:
serve,classify,status,build-centroids - OpenClaw and Codex onboarding commands