The following public API exports have been removed. Update imports to use the canonical names:
| Removed | Replacement |
|---|---|
CodeModeConfig |
SdkSurfaceConfig |
McpModeConfig |
McpSurfaceConfig |
ExpectedTool |
ExpectedAction |
ToolMatch |
ActionMatch |
LEGACY_PROJECT_CONFIG_NAME |
hard-code ".skill-optimizer/skill-optimizer.json" |
toLegacyOptimizeManifest |
removed with no replacement |
SurfaceSnapshotArg |
removed with no replacement |
TaskResult fields renamed: toolMatches → actionMatches, hallucinatedCalls → hallucinatedActions (on metrics), unnecessaryCalls → unnecessaryActions (on metrics). loadReport does not validate old field names — old report JSON files may produce unexpected output in detail views. Re-run the benchmark to generate a current-format report.
Existing tasks.json files using expected_tools (instead of expected_actions) or method (instead of name) on action entries will now fail to load with an error. Update affected task files: rename expected_tools to expected_actions and rename each action's method field to name.
The config file skill-benchmark.json is no longer auto-detected. Rename it to skill-optimizer.json.
- prompt surface type — benchmark and optimize prompt templates, Claude Code skills, and agent instructions. Discovers phases and capabilities from markdown, evaluates output quality with content-based criteria.
- Codex auth — direct OpenAI model runs can use browser-login tokens stored by Codex (
~/.codex/auth.json) instead of requiringOPENAI_API_KEY. Setbenchmark.authMode: "codex"and useopenai/<model>IDs. - SKILL folder — bundled AI-agent guidance (
SKILL/SKILL.md) so agents can use skill-optimizer reliably without extra setup. - Optimizer loop diagram — README now includes a visual workflow diagram of the optimizer loop.
- Stable task IDs — task IDs are now derived from a SHA-1 hash of the action names (SDK/CLI/MCP surfaces) or prompt text (prompt surface). For SDK/CLI/MCP surfaces, where action names come from discovered code rather than LLM output, IDs are stable across regenerations and the
--task <id>filter works reliably. For the prompt surface, IDs are stable when the LLM produces identical wording; if it rephrases a task the ID changes (fixes #17).
- benchmark: Strip provider prefix from model ID when using direct
anthropicoropenaiformats. Previously,anthropic/claude-sonnet-4-6was sent as-is to the Anthropic API, which expectsclaude-sonnet-4-6. Thepiformat is unaffected. - model IDs: OpenRouter model slugs now preserve dots in version numbers (e.g.
openrouter/anthropic/claude-sonnet-4.6). Presets updated to match OpenRouter's catalog exactly. The dot→hyphen rewrite invalidate/fixnow applies only to theanthropic/direct-API prefix;openrouter/andopenai/slugs are exempt. - error message:
E_MODEL_ID_FORMATnow lists all three valid provider prefixes (openrouter/,anthropic/,openai/) instead of directing all users to useopenrouter/. - Prompt-surface benchmarks no longer hard-FAIL on
scopeCoverage.coverageViolation; coverage is informational for prompt runs (src/benchmark/scoring.ts). - Prompt-surface tasks are now scored against the specific capability they exercise via a required
capabilityIdonGeneratedTask. Previously every task was scored against the first discovered capability (src/benchmark/runner.ts,src/benchmark/prompt-criteria.ts,src/tasks/generate.ts). - Prompt evaluator surfaces
noActiveCriteria: true(score 0, runner-level FAIL with an actionable message) when a capability's section produces empty criteria, replacing the previous vacuous 1.0 pass (src/benchmark/prompt-evaluator.ts). openai/direct-API model IDs are exempt from dot→hyphen rewriting inapplyFixes. OpenAI's API slugs use dots (gpt-5.4,gpt-4.1). (src/project/fix.ts)- Removed dead
src/discovery/prompt.ts. Active discovery path issrc/project/discover-prompt.ts.
First public release.