feat(settings): add configurable image-model picker (Image role)#464
Open
LissaGreense wants to merge 30 commits into
Open
feat(settings): add configurable image-model picker (Image role)#464LissaGreense wants to merge 30 commits into
LissaGreense wants to merge 30 commits into
Conversation
The Settings image picker needs a freshness signal — the set of image models the gateway/provider currently advertises — to intersect against the hand-curated capability overlay. CatalogEntry gains an `images: ModelInfo[]` field populated alongside the existing `models` (language) field. Routing rules in groupGatewayModels: - Gateway entries with `modelType: 'image'` route to the appropriate provider's images bucket (catches Imagen under vertex). - Google (`vertex` + `google/gemini-`) `-image` ids invert the existing language filter — they ship as `modelType: 'language'` so the id-based rule is the only thing that catches them. - OpenAI `gpt-image-*` / `dall-e-*` id patterns route to images regardless of modelType, defensive against gateway misclassification. Direct-fetch providers (groq / openrouter / local) keep `images: []` — they have no image surface. Existing language partitioning is unchanged; all 13 prior tests pass without modification. ## Progress - Task: #6 Extend model-catalog with `images: ModelInfo[]` bucket - Decisions: - Belt-and-suspenders on OpenAI: both modelType=image AND id-pattern route to images. Gateway has shipped both shapes for dall-e historically; pattern match alone catches future regressions. - Added a vertex-fallthrough branch for `modelType: 'image'` so Imagen ids (no `-image` substring) land in google.images without needing a separate `imagen-` prefix check. - `images` is non-optional on CatalogEntry — every entry carries an empty array rather than `undefined`, so consumers don't branch. - Key Learnings: - Google ships `-image` variants under `modelType: 'language'` in the gateway response — the id-substring check is load-bearing for image discovery, not a defensive duplicate. - The Svelte components duplicate CatalogEntry locally rather than importing from `@atlas/llm` — adding fields is non-breaking but consuming new fields requires touching each duplicate. - Files: - packages/llm/src/model-catalog.ts - packages/llm/src/model-catalog.test.ts
Tracer bullet for the image-model picker — schema → resolver → agent.
Adds `image` as a fifth `PlatformRole` and a sibling `getImage(): ImageModelV3`
on `PlatformModels` so the image-generation agent stops calling
`registry.imageModel("google:gemini-3.1-flash-image-preview")` directly and
defaults to the stable `google:gemini-2.5-flash-image` instead.
Changes:
- `PlatformModelsSchema` (config/atlas.ts) gains `image: ModelIdSchema.optional()`.
- `DEFAULT_PLATFORM_MODELS.image` = `["google:gemini-2.5-flash-image"]`.
- `PlatformModels` interface (llm + mirrored in agent-sdk) gains `getImage()`.
- New `resolveImageRole` private function mirrors `resolveRole`'s chain-walk
semantics (format-strict, unknown-provider-strict, missing-cred-tolerant
across multi-entry chains) but constructs via `registry.imageModel(id)`.
- `getImage()` logs `Image model resolved` on success matching the boot
pattern; throws `PlatformModelsConfigError` when no entry has credentials.
- New `image-capabilities.ts` overlay with one entry for the default model;
full 6-entry overlay deferred to task #2.
- Agent (image-generation/agent.ts:99) now calls `platformModels.getImage()`.
- `createStubPlatformModels` extended with `getImage()` for downstream tests.
Boot-time overlay validation (`unknown_image_model` ErrorKind) is explicitly
deferred to task #3 — tracer bullet covers the seam only.
## Progress
- Task: #1 — Tracer Bullet: Wire image model through PlatformModels resolver to agent
- Decisions:
- Forked `resolveImageRole` from `resolveRole` rather than parameterized.
`resolveRole` threads through `traceModel()` (LanguageModelV3-specific);
no `traceImageModel` exists. Generic helper at two callers buys less
than it costs.
- Used `ImageModelV3` (not the design-doc's "ImageModelV4" — the SDK only
ships V3). Confirmed with team-lead before committing.
- Narrowed `get(role)` to `LanguageRole = Exclude<PlatformRole, "image">`
so the type system enforces the right method per model type. `PlatformRole`
still includes `"image"` so `DEFAULT_PLATFORM_MODELS` and config-shape
machinery key uniformly across all five roles.
- Dropped stale `registry.imageModel` mock in agent.test.ts now that the
agent no longer touches the registry directly; replaced with a structurally-
valid `ImageModelV3` stub on the `platformModels` context.
- Key Learnings:
- `PlatformModels` interface is intentionally duplicated in both
`packages/llm/src/platform-models.ts` and `packages/agent-sdk/src/types.ts`
(agent-sdk is a leaf package — no cross-package imports). Comment at
agent-sdk/types.ts L40 explicitly documents the mirror. Any future
interface change must update both.
- The AI SDK exposes `ImageModelV3` (latest); there is no `ImageModelV4`
despite occasional references in design docs.
- Combined `deno check` over many files can intermittently emit a spurious
TS2589 "type instantiation excessively deep" error that disappears on
rerun or when checking files individually — likely a workspace-resolution
flake, not a real type problem.
- Files:
- packages/config/src/atlas.ts — schema
- packages/llm/src/platform-models.ts — PlatformRole, resolver, getImage()
- packages/llm/src/image-capabilities.ts — new overlay module
- packages/llm/mod.ts — re-exports
- packages/llm/src/test-utils.ts — stub helper extended
- packages/llm/src/platform-models.test.ts — 2 new getImage() tests
- packages/agent-sdk/src/types.ts — mirrored interface
- packages/bundled-agents/src/image-generation/agent.ts — swap to getImage()
- packages/bundled-agents/src/image-generation/agent.test.ts — drop dead mock
Extends IMAGE_OVERLAY from the tracer bullet's single Gemini entry to the full v1 set of six verified models — three Google (Gemini 2.5 Flash Image + Imagen 4 quality/fast) and three OpenAI (GPT Image 1.5, DALL·E 3, DALL·E 2). The capability-check (Task #4) and controlAxis-dispatch (Task #5) work, plus the validation harness (Task #9), all iterate this overlay, so locking the full matrix unblocks them. Defaults rationale: - Google → `1:1` PNG. Square is Gemini/Imagen's natural output; PNG keeps lossless transport for edit flows. - OpenAI → `1024x1024` PNG. Baseline supported by every entry; avoids paying for higher-res tiers the picker doesn't surface. GPT Image 1.5 carries an explicit `note` about the verified-org requirement so users hit that gate at config time rather than at first generation attempt. New `image-capabilities.test.ts`: - shape: every entry has non-empty displayName, type-round-trips, ISO date parses - provider→controlAxis split (google=aspectRatio, openai=size) is locked in — protects Task #5's dispatch site - freshness: scans `lastValidatedAt` and fires a non-failing `logger.warn` for entries older than 180 days (harness updates these) - lookup: hit + miss cases ## Progress - Task: #2 Populate full image-capabilities overlay (6 entries) with round-trip test - Decisions: - Stale check is advisory, not a hard fail. The validation harness (Task #9) is the source of truth for freshness; the test just surfaces drift so reviewers notice before merging stale data. - Asserted the provider→controlAxis split in tests explicitly. Even though the discriminated union catches structural mismatches at compile time, a future entry could be structurally valid but semantically wrong (e.g. an OpenAI entry with `aspectRatio`). - Used a `if (entry === null) throw` narrowing in the lookup test instead of an `as` assertion (CLAUDE.md: no `as`). - Key Learnings: - `logger.warn(msg, ctx)` is the established shape across the codebase (`packages/llm/src/small.ts`, `core/credential-fetcher.ts`). Spy on it via `vi.spyOn(logger, "warn")` rather than `console`. - Discriminated union on `controlAxis` is genuinely load-bearing — TypeScript blocks the wrong-axis param at the IMAGE_OVERLAY literal, so a typo at the data layer fails the compile, not the agent. - Files: - packages/llm/src/image-capabilities.ts - packages/llm/src/image-capabilities.test.ts
Extend createPlatformModels to pre-flight `models.image` at boot: each chain entry must format-parse, name a known provider, AND have an IMAGE_OVERLAY entry. A typo like `google:gemini-2.5-flas-image` now fails daemon startup with `unknown_image_model` instead of surfacing on the first image-gen attempt. Credential checks remain deferred to getImage() — daemons must still boot when image providers are intentionally unset. Boot errors aggregate with language-role errors into a single PlatformModelsConfigError so operators see every problem in one startup attempt. The error formatter renders `unknown_image_model` as a new arm listing the offending id alongside the full set of known overlay ids, copy-pastable for fixes. resolveImageRole keeps the overlay check inline as a defensive last line for hot-reload / test-injection paths that bypass boot. ## Progress - Task: #3 Add boot-time overlay validation with `unknown_image_model` ErrorKind - Decisions: - Separate `preflightImageRoleAtBoot` instead of folding image into the `LanguageRole[]` loop — the language path returns LanguageModelV3 via resolveRole, the image path returns ImageModelV3, and a generic abstraction across two callers buys less than it costs. - Boot-time validation does NOT check credentials for image (departs from language-role behavior) so unconfigured image providers don't block daemon startup — the credential check is `getImage()`'s job. - Per-chain first-error-wins (same as `resolveRole`); aggregation happens across roles via the shared `bootErrors` array. - Key Learnings: - The image role's boot-validation shape intentionally diverges from language roles: language defaults assume keys are required, image defaults assume the provider may be unconfigured. Future "optional" roles should follow this pattern rather than the eager-resolve pattern. - Files: packages/llm/src/platform-models.ts, packages/llm/src/platform-models.test.ts
Resolve the image model once via platformModels.getImage(), look up its capability entry in the IMAGE_OVERLAY, and short-circuit with a clean err(string) when the user pairs an edit-mode prompt (artifact UUID reference) with a generation-only model (DALL·E 3, Imagen 4, Imagen 4 Fast). Without this gate the agent forwarded edit prompts to providers that cannot honour them and produced nonsense output. The error names the displayName, states "supports generation only", and points at Settings → Image — everything a user needs to act. A defensive null branch surfaces overlay/resolver desync if it ever slips past boot validation. The single `entry` binding is positioned so task #5 can reuse it for controlAxis dispatch without re-querying the overlay. ## Progress - Task: #4 Agent capability check — err() when edit requested on gen-only model - Decisions: - Lookup runs BEFORE source-image binary reads — fail fast, no wasted I/O. - Kept the defensive `entry === null` branch even though #3 now boot-validates; a logger.error fires so a regression in the resolver/overlay contract is loud. - Default test stub modelId switched from "stub-image" (overlay-missing) to "google:gemini-2.5-flash-image" (edit-capable) so existing tests pass the new capability gate without per-test plumbing; gen-only tests override via `stubPlatformModels.getImage.mockReturnValueOnce(...)`. - Key Learnings: - `vi.mock("@atlas/llm", () => ({...}))` is a FULL replacement — adding a second consumer of the module (lookupImageEntry) breaks silently with "No 'X' export is defined on the mock." Fix: `vi.importActual` and re-export the pure helpers, only stub what actually needs faking. - `vi.clearAllMocks()` does NOT clear queued `mockReturnValueOnce` — but the queue is naturally drained by the single call per test, so `Once` is the safest way to vary platformModels.getImage per test without leaking state. - Files: - packages/bundled-agents/src/image-generation/agent.ts - packages/bundled-agents/src/image-generation/agent.test.ts
Drop the hardcoded IMAGE_SIZE constant and dispatch on the overlay
entry's `controlAxis` discriminator. Size-axis entries (DALL·E, GPT
Image) now receive `{ size }` and aspectRatio-axis entries (Imagen,
Gemini) receive `{ aspectRatio }` — each provider gets the param
shape it actually expects, no cross-axis bleed.
Tightened `ImageDefaults.aspectRatio` from `string` to
`${number}:${number}` so the overlay value narrows cleanly into the
AI SDK's expected template literal. All existing entries already
match the tighter type ("1:1"); no entry change needed.
The stale generation-complete log field `imageSize: IMAGE_SIZE` is
removed alongside the constant.
## Progress
- Task: #5 Agent controlAxis dispatch — size vs aspectRatio param shape
- Decisions:
- Computed `controlParams` object spread into generateImage rather than
duplicating the call across the discriminator branches — keeps the
happy path single-call and lets the type-level union enforce
"exactly one axis" at the use site.
- Tightened the overlay's aspectRatio template literal at the type
layer rather than asserting at the agent boundary; no `as`, no Zod
surface area added, and the change strictly narrows what the
overlay can ever contain.
- Replaced the stale "passes 1024x1024 size" test with two per-axis
tests (aspectRatio default via Gemini stub, size via DALL·E
override). The old test wired-in the previous default and no longer
reflects how the agent dispatches.
- Key Learnings:
- Spreading a discriminated union into a function that accepts both
keys as `?:` optional works in practice but blows TS instantiation
depth (TS2589) if the inner property types don't *also* line up with
the SDK's narrow types. Fix at the data layer (tighten the overlay)
rather than the call site — same end state, no spread-with-cast
gymnastics.
- `deno check <file>` may transitively pull in workspace files outside
the file's import graph; unrelated WIP in `apps/atlasd/routes/config.ts`
surfaces here even though my agent code is independently clean.
Spot-check via `deno check` on the leaf file only when in doubt,
and trust `deno task test` against the test file for behaviour.
- Files:
- packages/bundled-agents/src/image-generation/agent.ts
- packages/bundled-agents/src/image-generation/agent.test.ts
- packages/llm/src/image-capabilities.ts
… capture
scripts/validate-image-models.ts iterates listImageEntries() and exercises
each entry against live providers under both direct and proxy transports.
Captures envelope-only fixtures ({warnings, providerMetadata, mediaType,
base64Length, imageCount}) keyed under {direct, proxy} in a single per-model
JSON file. Failed transports write null (not omitted) so Task #10's
parameterized tests can distinguish "not yet validated" from "validated as
broken."
Dual-transport mechanics: a single process mutates LITELLM_API_KEY and calls
resetRegistry() between passes, restoring original env on exit. Pre-flight
cost preview + y/N gate guards spend; MAX_CALLS=24 is a hard ceiling.
`lastValidatedAt` updates: the script prints a unified diff for manual
application rather than codemodding image-capabilities.ts directly —
source-mutation from a one-shot script fights the formatter and is fragile.
Thumbnails (design v3 step 4) are intentionally not written; a true ≤64×64
thumbnail needs a PNG decoder + resampler, which is too much dependency
surface for a one-shot. The _thumbnails/ dir is still gitignored so a future
operator can drop sanity-check PNGs locally without polluting the repo.
## Progress
- Task: #9 — Validation harness with dual-transport capture
- Decisions:
- Diff-print over codemod for lastValidatedAt updates (option b per task brief)
- Plain stdout writes over @atlas/logger (JSON output is hostile to interactive CLI UX)
- Skip thumbnail writing; documented as a future enhancement requiring a resampler
- Per-entry fixture write after each transport pass — mid-run crashes preserve partial progress
- Key Learnings:
- `prompt()` global (DOM lib) reads from TTY only — piped stdin returns null, so the y/N gate aborts safely when run non-interactively without --yes
- The AI SDK's `aspectRatio: \`${number}:${number}\`` template-literal type can be narrowed without `as` via `z.custom<T>` with a type-guard predicate
- `resetRegistry()` is required between transport switches — the registry caches provider clients with their apiKey captured at construction
- Files:
- scripts/validate-image-models.ts (new)
- .gitignore (+_thumbnails/ entry)
Parameterize agent tests on every overlay entry × {direct, proxy},
reading captured envelopes from `__fixtures__/<provider>__<model>.json`
(produced by the validation harness in scripts/validate-image-models.ts).
Coverage matrix is self-documenting: passed / skipped tests appear in
verbose test output with the reason inline:
✓ google:gemini-2.5-flash-image · direct · dispatches aspectRatio param
↓ google:gemini-2.5-flash-image · proxy · skipped (proxy fixture null)
↓ openai:dall-e-2 · direct · skipped (no fixture file)
✓ openai:dall-e-3 · direct · returns err on edit prompt (gen-only)
Edit-capable entries run a gen-mode prompt and assert controlAxis
param shape + success (exercises #5). Gen-only entries run an edit
prompt and assert the capability `err()` (exercises #4); generateImage
is never called for those, so controlAxis is covered exclusively via
edit-capable entries within the matrix.
When the fixture file is absent OR the transport key is `null`, the
pair emits `test.skip` with the reason in the title. The matrix never
fails the suite for missing fixtures — operators fill in coverage by
re-running the harness over time.
The `@atlas/llm` vi.mock spreads `...actual` instead of enumerating
exports, so future overlay helpers don't need a mock-update PR. Only
`smallLLM` remains overridden.
## Progress
- Task: #10 Fixture-driven `(model × transport)` parameterized agent tests
- Decisions:
- Manual `for` loop over `listImageEntries() × TRANSPORTS` instead of
`describe.each` — gives explicit per-pair control of the test
variant (dispatch vs capability err) and keeps skip reasons in
titles rather than table fragments.
- Fixture loader is sync (existsSync + readFileSync) because vitest
needs the skip-vs-run decision at describe-time, before any
`beforeEach` runs.
- Synthesize an 8-byte PNG signature for the generateImage mock
response — the agent only copies bytes through to ArtifactStorage,
so signature is sufficient; matching the envelope's base64Length
exactly would buy nothing.
- FixtureSchema mirrors the harness's EnvelopeSchema exactly
(warnings/providerMetadata/mediaType/base64Length/imageCount).
`providerMetadata` is `z.unknown()` not `z.record(...)` to match
the harness — providers return wildly different metadata shapes.
- Key Learnings:
- When a `vi.mock(...)` factory passes most of a module through
(`...actual`), enumerate explicit exports only when you want to
block re-exports — spreading is the right default and survives
future module additions without test churn.
- Test-time fixture I/O at describe-time is the right place for
the file-exists check: vitest re-runs the describe function on
every invocation, so adding a fixture file later picks up
automatically without code change.
- Files: packages/bundled-agents/src/image-generation/agent.test.ts
…verlay picker Surfaces the image-model layer through the Settings UI so users can configure a primary→fallback chain without hand-editing friday.yml. The Image row reuses the existing `<ModelChain>` component but routes its slot-edit clicks to a new `<ImageModelPicker>` that renders the static `listImageEntries()` overlay grouped by provider, each with `gen ✓` / `edit ✓` / `edit ✗` badges and the optional `note` field (e.g. "Requires a verified OpenAI organization." on gpt-image-1.5). Locked-provider banner reuses the same inline "Save & unlock" flow the language picker provides. Daemon-side wiring: `apps/atlasd/routes/config.ts` adds `"image"` to `PLATFORM_ROLES`, accepts `image: roleValueSchema.optional()` on PUT, and documents `images: ModelInfo[]` on the catalog response schema (already on the wire from task #6). The GET handler special-cases image — `getImage()` can throw post-boot when no credentials are present (boot only validates format + overlay membership), so the route falls back to the user's configured chain head (or `DEFAULT_PLATFORM_MODELS.image[0]`) so the tile renders the would-be model and the locked-banner UX carries the missing-credential signal. Browser-safe import: `@atlas/llm`'s mod.ts transitively pulls `node:path` through `@atlas/utils/paths`, so importing `listImageEntries` from the bare package crashed the SvelteKit bundle. Added an `./image-capabilities` subpath export (same pattern as `./pricing`) and updated both call sites to use it. `save-api-key.ts`'s local `CatalogEntry` duplicate gained the `images` field since the splice now flows the post-unlock catalog through the typed UI state in `+page.svelte` (which now requires `images`). The other duplicate (`model-pill.ts`) is untouched — chat consumers only read `entry.models`. ## Progress - Task: #7 Settings UI — Image row with capability badges and overlay intersection - Decisions: - "Intersection" rule resolves to "render all overlay entries" — overlay membership is the gate, gateway listing is a freshness signal. AC's contradictory wording is resolved by the design doc's "overlay entries the gateway doesn't surface are still shown." - New `ImageModelPicker.svelte` rather than conditional-laden `ModelPicker` — overlay-driven entries + badges + notes diverge enough that branching inside the language picker would bury both. - `imageCatalog` derived view (overlay entries grouped by provider, mapped onto each catalog entry's `models` field) lets `<ModelChain>` render friendly displayNames for image picks without touching ModelChain. - Daemon GET catches `getImage()` throws and falls back to configured head or default head — agent path still wants the throw (loud failure at gen time), only settings-display wants the tolerant fallback. No new `tryGetImage()` accessor to pollute the agent-sdk mirror. - Key Learnings: - `@atlas/llm`'s main entry pulls server modules transitively — anything Svelte-side needs a narrower subpath export. The existing `./pricing` entry is precedent for this pattern; `image-capabilities` is the second. Future browser-side imports from `@atlas/llm` should add their own subpath rather than depending on the main entry being browser-safe. - Vite caches `package.json` `exports` at startup — adding a new subpath export requires bouncing the dev server to take effect. - `resolver()` from `hono-openapi` only emits OpenAPI docs, not runtime parsing — `catalogResponseSchema` missing the `images` field never silently stripped data; it just left the OpenAPI doc out of sync. - The daemon already serializes `images` on `/api/config/models/catalog` via `c.json(catalog.entries)` because the underlying `CatalogEntry` from `@atlas/llm` (task #6) carries it — the wire was already correct, only types were lagging. - `getImage()` throws a `PlatformModelsConfigError` (not just an Error) when no chain entry has credentials, even though boot validation succeeded — credential checks are deferred to call time by design. - Files: - apps/atlasd/routes/config.ts - packages/llm/deno.json - packages/llm/package.json - tools/agent-playground/src/lib/components/settings/image-model-picker.svelte (new) - tools/agent-playground/src/lib/save-api-key.test.ts - tools/agent-playground/src/lib/save-api-key.ts - tools/agent-playground/src/lib/server/lib/context.ts - tools/agent-playground/src/routes/settings/+page.svelte
Adds a dismissible banner inside the Settings Image row that softens the default-model swap (gemini-3.1-flash-image-preview → gemini-2.5-flash-image) for upgrade users. Banner renders only when all three hold: friday.yml exists, `models.image` has no override, and `localStorage["image-picker-intro-seen"]` is not "true". Dismiss writes the localStorage key so the banner never returns. Fresh installs see nothing — there's no prior aesthetic to compare against. To detect fresh-install vs upgrade without server state, `GET /api/config/models` now sets the optional `configPath` field only when friday.yml is actually on disk (previously always set). The playground reads presence-of-`configPath` as the upgrade signal. Schema is already `.optional()` and no other consumer reads the field, so the change is backward-compatible. Implementation notes: - `imagePickerIntroDismissed` is read once on mount inside `$effect` (guarded against SSR) and starts `true` so the banner can't flash before the localStorage read settles. - `imageRoleUnset` is `$derived` from the loaded `models` array, not from the in-flight `chains` edit state — the banner reflects persisted server config, not unsaved picks. - A small `.role-body` wrapper stacks the banner above `<ModelChain>` inside the Image role-card's right grid column without disturbing the language roles or the `200px 1fr` layout. ## Progress - Task: #8 Settings UI — one-time migration nudge banner for image default change - Decisions: - Reuse `configPath` as the upgrade signal instead of introducing a new `configExists` boolean. The schema is already optional, the field had no consumers, and "configPath present iff file exists" is the expectation the design-doc settled on. - Read `models[role].configured === null` from the loaded state, not from `chains[role].length === 0`. The latter would hide the banner the instant the user opens the picker, before any save. - Plain `try/catch` around the `localStorage.setItem` write so private mode / quota-full doesn't take down the dismiss UX — the in-memory flag still hides the banner for the session even if the persist fails. - No animation, no "remind me later", no toast. Banner shows once, dismiss button hides it forever. Matches the task's "exactly what's asked" guidance. - Key Learnings: - `GET /api/config/models` had `configPath` declared as `.optional()` in the response schema but always populated it — making it conditional matches the schema's already-permissive intent and gives the playground a clean upgrade-detection signal without a new field. - The dev-watcher only respawns the daemon when the old child exits; if a stale daemon from an earlier session is holding port 29080, the new watcher's spawn fails with "address already in use" and your code edits never take effect. Kill the running daemon PID and the watcher will respawn it with the new source. - Svelte 5 `$effect` doesn't run on the server, but the script body does — anything that touches `localStorage` at top level still needs a `typeof` guard for SvelteKit SSR. - Files: - apps/atlasd/routes/config.ts - tools/agent-playground/src/routes/settings/+page.svelte
Use `"configPath" in data` to narrow before reading, instead of an `as` assertion. Matches the surrounding `in`-narrow pattern in `loadModels` and keeps the file `as`-free per project convention. ## Progress - Task: #8 Settings UI — one-time migration nudge banner for image default change - Decisions: lead-requested polish on top of c13eecc; no behavior change. - Files: tools/agent-playground/src/routes/settings/+page.svelte
Per-task commits passed `deno check` but didn't run `deno task fmt`. Biome collapses short multi-line object literals and re-sorts `mod.ts` re-exports alphabetically. No behavior change.
PR #464 added `getImage(): ImageModelV3` to the `PlatformModels` interface but missed several inline stub implementations across the workspace. The canonical `createStubPlatformModels` helper was updated, but stub literals embedded in test files and the evals harness still satisfied the old shape — workspace-wide `deno check` failed with 9 errors. Mirror each file's existing stub idiom (vi.fn for vitest sites, throw-on-invoke for non-vitest contexts). No test invokes getImage directly, so throw-on-invoke is sufficient everywhere. Files touched: - apps/atlasd/routes/workspaces/draft.test.ts (4 sites) - apps/atlasd/src/chat-skill-discovery.test.ts - packages/bundled-agents/src/web/tools/search.test.ts - packages/workspace/src/__tests__/context-memory-mounts.test.ts - tools/evals/lib/context.ts
Contributor
Author
…odelId
QA case 4 surfaced that every image generation fails with "unknown to
Friday's capability overlay" — including the default model on a fresh
setup. Root cause: the agent looked up `IMAGE_OVERLAY` by `model.modelId`
(bare id like `gemini-2.5-flash-image`), but the overlay is keyed by
`provider:model` (e.g. `google:gemini-2.5-flash-image`). The SDK
`ImageModelV3` also reports `provider` as the transport string
(`google.generative-ai`), so concatenating wouldn't have helped either.
Extend `PlatformModels` with `getImageOverlayKey(): string`, which returns
the resolved `provider:model` spec the resolver used to build the model.
The resolver already had this string — it just dropped it on the floor
returning only the SDK model. Refactor `resolveImageRole` to return
`{ key, model }` so `getImage()` and `getImageOverlayKey()` share one
resolution pass and can't diverge if process.env flips between calls.
The agent now calls `platformModels.getImageOverlayKey()` for the overlay
lookup. The error log/message also switches to the overlay key (more
useful for debugging — includes the provider half).
Regression-guard: the agent test's stub `ImageModelV3` now carries
SDK-shaped values (provider="google.generative-ai", modelId="gemini-2.5-flash-image"
— neither is an overlay key). If the agent ever regresses to looking up
by `model.modelId`, the overlay lookup fails and tests catch it. Added an
explicit assertion test that the agent calls `getImageOverlayKey()`.
Synced the duplicate `PlatformModels` interface in `@atlas/agent-sdk`
(SDK leaf can't import from `@atlas/llm`). Added `getImageOverlayKey`
stubs to every test fixture listed in the AC.
## Progress
- Task: #1 Fix: image-agent overlay lookup uses wrong key
- Decisions: Refactored resolver to return {key,model} for atomic resolution
(avoids theoretical race when process.env flips between getImage and
getImageOverlayKey). Stub model uses SDK-shaped values rather than
embedding the overlay key in modelId — turns the test into a regression
guard for this specific bug class.
- Key Learnings: SDK ImageModelV3.provider is the transport string
("google.generative-ai"), not the overlay/registry provider ("google").
Any capability lookup keyed on overlay shape must go through the configured
spec, not the SDK model's reported fields. PlatformModels is duplicated
in @atlas/agent-sdk (SDK is a leaf, can't import @atlas/llm) — interface
changes must land in both.
- Files: packages/llm/src/platform-models.ts, packages/llm/src/test-utils.ts,
packages/agent-sdk/src/types.ts,
packages/bundled-agents/src/image-generation/{agent.ts,agent.test.ts},
apps/atlasd/routes/workspaces/draft.test.ts,
apps/atlasd/src/chat-skill-discovery.test.ts,
packages/bundled-agents/src/web/tools/search.test.ts,
packages/workspace/src/__tests__/context-memory-mounts.test.ts,
tools/evals/lib/context.ts
The Settings UI's PUT /api/config/models wrote the new models block to friday.yml but only constructed a throwaway PlatformModels for validation; the daemon's live instance kept serving the boot-time config until restart. Every model role (conversational, labels, classifier, planner, image) was affected. Make PlatformModels itself reloadable in place: the closure-captured userConfig moves into a mutable cell, and a new `reload(input)` re-runs the same boot-time validation before swapping. Long-lived references held by WorkspaceRuntime, MCP servers, and route handlers stay valid — every `get()`/`getImage()` call reads through the cell and picks up the new config. The PUT handler now calls `reload` after writing friday.yml; the response's `restartRequired` flag flips to false when the swap succeeds. Adversarial: `reload` validates BEFORE mutating, so a bad config throws without corrupting in-memory state. Pre-flight in the route still runs first, so disk-write only happens on a config the daemon will also accept. ## Progress - Task: #2 Fix: PUT /api/config/models doesn't reload live PlatformModels - Decisions: - Option B (in-place reload) over Option A (daemon setter + indirection) because WorkspaceRuntime captures the reference at construction — a setter that swaps `daemon.platformModels` would not reach those long-lived holders. Mutating the closed-over cell is invisible to consumers and requires zero plumbing changes. - Mirrored `reload` into `@atlas/agent-sdk`'s PlatformModels interface (typed as `unknown` input there — SDK is a leaf, can't import PlatformModelsInput). Keeps cross-package assignability without breaking the leaf-node invariant. - Key Learnings: - The Hono context already exposes `platformModels` via a lazy getter (atlas-daemon.ts:429), so routes read through `daemon.platformModels` on each request — perfect substrate for in-place reload. - WorkspaceRuntime captures `options.platformModels` once at construction (runtime.ts:1757) — a daemon-level setter that swaps the reference would NOT reach already-running workspaces. Any future "swap the instance" approach has to also reconstruct or notify every WorkspaceRuntime. - `daemon-startup.test.ts` reads the dev machine's real friday.yml (no FRIDAY_CONFIG_PATH override) — 5 tests fail with "local:google/gemma-4-e4b missing creds" purely due to local-env contamination, unrelated to this change. Pre-existing. - Files: - packages/llm/src/platform-models.ts: reload method, mutable userConfig cell - packages/llm/src/platform-models.test.ts: reload unit tests - packages/llm/src/test-utils.ts: stub reload no-op - packages/agent-sdk/src/types.ts: mirror reload in SDK interface - apps/atlasd/routes/config.ts: PUT handler calls reload, docstring updated - test stubs: draft, chat-skill-discovery, web/tools/search, context-memory-mounts
Biome's auto-format prefers multi-line object literals here. No logic changes — just whitespace.
Two hand-rolled `PlatformModels` stubs were missed when `getImageOverlayKey` (447e52b) and `reload` (6d06e01) landed on the interface, breaking CI type-check on #464. Add throw-on-invoke methods matching each file's local idiom — playground points callers at `PlaygroundContextOpts.platformModels`, evals points at "once evals adopt the platform model path". ## Progress - Task: #3 Add getImageOverlayKey + reload to two missing PlatformModels stubs - Files: - tools/agent-playground/src/lib/server/lib/context.ts: getImageOverlayKey + reload stubs - tools/evals/lib/context.ts: reload stub (other two already present)
The Settings page hardcoded "Restart the daemon to apply." and ignored
the PUT /api/config/models response body. Since the daemon now
hot-reloads `PlatformModels` and returns `{ restartRequired: false }`,
users were told to restart needlessly. Parse the response with Zod and
branch the flash text: "Saved. Active now." on hot-reload,
restart-required message only when the daemon couldn't hot-swap. Also
update the stale header doc-comment.
## Progress
- Task: #4 Fix Settings flash + comment to honor restartRequired
- Decisions: Used Zod safeParse to match the file's existing schema
pattern (TunnelStatusSchema, etc.) and avoid `as` assertions. Reused
the body already read at line 505 to keep the change surgical.
- Key Learnings: agent-playground svelte files are excluded from
\`deno check\` — use \`npx svelte-check --tsconfig ./tsconfig.json\` from
the package root (run via the \`check\` npm script).
- Files: tools/agent-playground/src/routes/settings/+page.svelte
PUT /api/config/models previously wrote `friday.yml` (just `version` + `workspace` blocks) even when the request's `newModels` map was empty and the file didn't exist. This promoted fresh-install users into the "upgrader" state, causing the migration banner in Settings to appear spuriously on next session — the banner gates on file presence. Wrap the existing write/reload block in `if (!skipWrite)` where `skipWrite = empty newModels && !file exists`. Falls through to the existing `success: true, restartRequired: false` exit. No new return arms, so Hono's RPC return-type inference stays within TS's depth budget. All other paths (file exists with models, partial state, first explicit role) write normally, per the plan's "absent = fresh, present = upgrade" definition. ## Progress - Task: #7 — Skip friday.yml write when PUT /api/config/models has empty models AND file doesn't exist - Decisions: Restructured as guard around the write block rather than an early-return short-circuit. Adding a sixth `c.json()` site tripped TS2589 "Type instantiation is excessively deep" on Hono's inferred response union — the existing handler already has 5 arms. Falling through to the same success exit avoids the issue without adding a helper or annotation. - Key Learnings: Hono's RPC return-type inference unions every `c.json()` call-site verbatim — even structurally identical ones aren't deduplicated. Adding "just one more" return arm to a handler with several existing arms can blow past TS's instantiation-depth limit (TS2589). Mitigation: reuse exit points (fall-through over early-return) when the existing handler is already near the limit. Documented in `apps/atlasd/CLAUDE.md` as the broader Hono RPC inference gotcha. - Files: apps/atlasd/routes/config.ts
Adds two HTTP-level tests for the new PUT /api/config/models handler: one valid-body round-trip proving disk write + hot-reload + GET reflect the new config, and one Zod-failing body proving 4xx + no disk write + live PlatformModels unchanged. Closes the only HTTP-level coverage gap on the new route — prior coverage in platform-models.test.ts hit only createPlatformModels/reload, not the route's validation/write/reload wiring. ## Progress - Task: #8 Add HTTP tests for PUT /api/config/models - Decisions: Use real createPlatformModels (LITELLM_API_KEY satisfies every provider) so reload() actually swaps state and the GET resolved field proves the hot-reload took effect — not just that the disk write happened. Follows daemon.test.ts's partial-AppContext stub pattern ({ platformModels } as unknown as AppContext) for context wiring. Picked groq for the labels happy-path role so resolved.provider observably changes from the anthropic default — a silent-reload regression would flip this assertion red. - Key Learnings: hono-openapi validator returns 400 (not 422) on Zod failure — confirmed against existing newline-bearing-values test. The PUT /models route reads friday.yml from FRIDAY_CONFIG_PATH (not FRIDAY_HOME), so tests must pin that env var separately. LITELLM_API_KEY is the cleanest credential hatch for resolver tests because hasCredential() short-circuits on it for every provider. - Files: apps/atlasd/routes/config.test.ts
…solved
The two image methods each invoked `resolveImageRole()` from scratch — full
chain walk, credential probe, registry construction — so the image agent's
back-to-back calls doubled the work per request. The internal `ResolvedImage`
type was explicitly created to keep the `{ key, model }` pair bonded (see the
comment block at platform-models.ts:314-318); the split API contradicted the
invariant and invited future callers to observe divergence if `process.env`
flipped between lookups.
Collapse both methods into a single `getImageResolved(): { key, model }` that
runs the resolver once and returns the pair atomically. The image agent
destructures `{ key: overlayKey, model }`, halving the per-call resolution
cost.
Mirror the change in `packages/agent-sdk/src/types.ts` PlatformModels
interface. Reload() left as `reload(input: unknown)` — Task #6 removes it
from the SDK separately.
Update every stub and call site:
- packages/llm/src/test-utils.ts (createStubPlatformModels helper)
- packages/bundled-agents/src/image-generation/agent.test.ts (stub + helper)
- packages/workspace/src/__tests__/context-memory-mounts.test.ts
- packages/bundled-agents/src/web/tools/search.test.ts
- apps/atlasd/src/chat-skill-discovery.test.ts
- apps/atlasd/routes/workspaces/draft.test.ts (4 instances)
- tools/agent-playground/src/lib/server/lib/context.ts
- tools/evals/lib/context.ts
## Progress
- Task: #5 — Collapse getImage + getImageOverlayKey into a single getImageResolved on PlatformModels
- Decisions: Returned the existing `ResolvedImage` type alias from the implementation rather than spelling out `{ key, model }` inline — the type was already exported-shape-equivalent and its doc-comment documents the bonding invariant. Kept the doc-comment on the interface terse: the SDK transport-string vs overlay-key distinction is the only thing callers really need to remember.
- Key Learnings: When two methods on an interface MUST be invoked together to stay consistent (same closure state, same env snapshot), collapsing them into one that returns the pair atomically is mechanically a better fit than relying on call-order discipline — even if no current caller misuses the split. The pre-existing internal `ResolvedImage = { key, model }` type and its bonding-invariant comment were a leading indicator that the API surface was wrong.
- Files: packages/llm/src/platform-models.ts, packages/llm/src/platform-models.test.ts, packages/llm/src/test-utils.ts, packages/agent-sdk/src/types.ts, packages/bundled-agents/src/image-generation/agent.ts, packages/bundled-agents/src/image-generation/agent.test.ts, packages/bundled-agents/src/web/tools/search.test.ts, packages/workspace/src/__tests__/context-memory-mounts.test.ts, apps/atlasd/src/chat-skill-discovery.test.ts, apps/atlasd/routes/workspaces/draft.test.ts, tools/agent-playground/src/lib/server/lib/context.ts, tools/evals/lib/context.ts
The `(model × transport)` matrix ran against fixture JSON files that aren't committed (produced by operator-run `validate-image-models.ts`). In CI every cell was emitting `test.skip(...)` — 12 skip lines that asserted nothing while the comment claimed the matrix was the "audit trail for coverage." Gate the matrix on `fixturesExist`: when no fixture files are present, register exactly one skip and bail; when fixtures land locally (or in a future commit), the full matrix runs. Update the comment to reflect the actual behavior. ## Progress - Task: #9 Collapse skipped fixture matrix to skipIf in agent.test.ts - Decisions: Picked Option B (skipIf-equivalent) over Option A (commit one fixture per provider) — Option A needs OPENAI / GEMINI API keys + ~1-2 hours running the cost-gated harness; B is the ~30-min path the team-lead recommended. Used a plain `if (!fixturesExist) { test.skip(...); return; }` rather than `describe.skipIf` because the latter still registers every inner `test.skip` from the `for` loop — yields N skip lines, not one. - Key Learnings: vitest's `describe.skipIf(true)` does NOT prevent the describe callback from running. It executes the body, registers every inner `test()` / `test.skip()`, and marks them all as skipped — so a matrix wrapped in `describe.skipIf` still emits N skip lines, not one. To get a single skip line, gate the registration loop itself with a plain `if`. - Files: packages/bundled-agents/src/image-generation/agent.test.ts
…er types
`packages/agent-sdk/src/types.ts` declared `reload(input: unknown): void` on
the `PlatformModels` interface purely "to mirror @atlas/llm" — a misreading
of structural subtyping. Agents never call `reload`; the only real caller
is `apps/atlasd/routes/config.ts` (PUT /api/config/models), which operates
on the richer `@atlas/llm` `PlatformModels` and is unaffected.
This commit:
- Removes `reload` from the agent-sdk `PlatformModels` interface and trims
the matching doc-comment paragraph (keeps the `getImageResolved` para).
- Deletes the dead 3-line throwing `reload` stubs from three test files:
`chat-skill-discovery.test.ts`, `search.test.ts`,
`context-memory-mounts.test.ts`.
- Narrows seven downstream parameter sites from `@atlas/llm`'s rich
`PlatformModels` to `@atlas/agent-sdk`'s narrow one. None of these
callers invoke `reload`; they only need `get` / `getImageResolved`.
Sites narrowed:
- `packages/llm/src/small.ts` — `smallLLM` params
- `packages/core/src/delegate/index.ts` — `DelegateDeps`
- `packages/bundled-agents/src/web/tools/search.ts` — `SearchToolContext`
- `packages/bundled-agents/src/web/tools/search-execution.ts` — `executeSearch`
- `packages/bundled-agents/src/claude-code/agent.ts` — `generateProgress`
- `packages/system/agents/workspace-chat/workspace-chat.agent.ts` — `generateChatTitle`
- `apps/atlasd/src/chat-skill-discovery.ts` — two inline `import()` sites
`@atlas/llm`'s richer `PlatformModels` (which keeps `reload`) still flows
into agent-sdk-typed positions via structural subtyping. The lone reload
caller in `apps/atlasd/routes/config.ts` continues to type-check against
the rich interface.
## Progress
- Task: #6 — Drop reload() from agent-sdk PlatformModels interface and delete dead stubs
- Decisions: Expanded scope by 7 files beyond the originally-scoped 4. Team-lead's research found `reload` *call* sites (just one) but missed *declaration* sites — places that typed `platformModels: PlatformModels` from `@atlas/llm` without ever calling reload. Removing reload from agent-sdk broke narrow→rich assignment at every agent that forwards `ctx.platformModels` into those slots. Pinged team-lead with the discovery, got approval, narrowed each consumer to the SDK type. Also dropped my own "no reload" explanatory comment paragraph to honor the "zero hits" verification gate — the deleted method is self-documenting in the diff.
- Key Learnings: When narrowing a shared interface, "find all callers of method X" is insufficient — search for declaration sites that USE the interface as a parameter type too. `grep "PlatformModels"` (the interface name) catches both. Narrow→rich assignment is the failure direction: rich → narrow is free under structural subtyping, but a value of the narrow type cannot satisfy a slot typed as rich (TS rejects: missing required member). Deno workspace cross-package imports resolve automatically via workspace member names — no `deno.json` `imports` entry needed when a workspace package imports from another workspace package.
- Files: packages/agent-sdk/src/types.ts (+ 3 test files for stub removal) + 7 consumer narrowings = 11 files total
…hared helper Routes every hand-rolled PlatformModels stub through createStubPlatformModels so the next interface change ripples through one file instead of five. To preserve each test's existing assertion behavior (vi.fn spying in agent.test, "should not be called" throws in the other three), extends the helper to accept wholesale `get` / `getImageResolved` overrides — picks up the team lead's suggested pattern verbatim. Per-role model overrides still work for tests that need real LanguageModelV3 behavior; the wholesale overrides take precedence when supplied. The two consumers that wholesale-mock @atlas/llm (search.test, chat-skill- discovery.test) now thread importOriginal so the helper survives the mock, and search.test's pre-existing top-level vi.fn() / vi.mock factory pairing gets the canonical vi.hoisted() fix (vitest's hoisting check fired loudly once the @atlas/llm mock went async). ## Progress - Task: #10 Migrate hand-rolled PlatformModels stubs to createStubPlatformModels - Decisions: Extended the helper's override surface (added `get`/`getImageResolved`) rather than working around it from the call sites — the override is the documented pattern the team lead's example implied, and putting it in the helper means future "throws if called" guards land in one line instead of five. Migrated context-memory-mounts to drop its `satisfies PlatformModels` altogether: the agent-sdk PlatformModels type is now structurally satisfied by the helper's @atlas/llm PlatformModels return (extra `reload` is a superset). For agent.test, kept vi.fn() spy handles in module-scope vars (`getSpy`, `getImageResolvedSpy`) so `mockReturnValueOnce` / `toHaveBeenCalled` still target the spies — passed those into the helper as the wholesale overrides. - Key Learnings: `vi.mock("@atlas/llm", () => ({...}))` (wholesale form) silently drops every re-export the test file consumes from the same module — including helpers imported at module scope before the mock applies. Use `(importOriginal) => ({ ...await importOriginal(), ...stubs })` whenever the file imports anything else from the mocked module. Separately: introducing an async vi.mock factory in a file that already has non-async `vi.mock` factories referencing top-level `const mockX = vi.fn()` vars exposed Vitest's hoisting rule — those vars need `vi.hoisted()`. The original test files happened to work because Vitest's hoisting was lenient on sync-only factory graphs; adding an async factory tightens it. - Files: packages/llm/src/test-utils.ts (helper extended), apps/atlasd/src/chat-skill-discovery.test.ts, packages/bundled-agents/src/image-generation/agent.test.ts, packages/bundled-agents/src/web/tools/search.test.ts, packages/workspace/src/__tests__/context-memory-mounts.test.ts
After the team's 7-task PR-review punch list landed, `deno task lint` auto-fixed line-length and object-literal formatting in 5 files: - apps/atlasd/routes/config.test.ts - apps/atlasd/routes/config.ts - apps/atlasd/routes/workspaces/draft.test.ts - apps/atlasd/src/chat-skill-discovery.test.ts - packages/bundled-agents/src/image-generation/agent.test.ts Pure formatting; no semantic changes.
…rlay The image-generation agent's defensive null-branch on the overlay lookup trusts boot-time pre-flight — but pre-flight skips `models.image` when the user hasn't pinned anything, so the guarantee depends entirely on every `DEFAULT_PLATFORM_MODELS.image` entry being keyed in `IMAGE_OVERLAY`. This invariant test fails CI if a future change adds a non-overlay default. ## Progress - Task: #2 Add invariant test asserting DEFAULT_PLATFORM_MODELS.image entries exist in IMAGE_OVERLAY - Decisions: Placed in image-capabilities.test.ts (where overlay tests live) rather than platform-models.test.ts — the invariant is a coupling check and reads more naturally next to other overlay assertions. Used filter+toEqual([]) over .every() so the failure message names the offending ids. - Key Learnings: None - Files: packages/llm/src/image-capabilities.test.ts
The boot $effect reads `image-picker-intro-seen` to suppress a one-time migration banner. Hardened-WebView and certain Safari ITP contexts throw on `localStorage` access entirely, not just on writes. An unguarded read aborts the effect before the async load chain (loadEnv → loadModels → loadCatalog → loadTunnel → refreshFromServer) runs, bricking the Settings page over a cosmetic banner. Wrap the read in try/catch matching the existing setItem guard in dismissImagePickerIntro. On failure the default `imagePickerIntroDismissed = true` keeps the banner hidden so the user isn't presented with an undismissable nudge. ## Progress - Task: #1 — Guard localStorage.getItem against access exceptions in Settings boot effect - Decisions: try/catch nested inside the existing `typeof localStorage` SSR guard, matching the symmetric setItem pattern at line 664. Catch is empty — default state already suppresses the banner; no logging needed for a cosmetic gate. - Key Learnings: Hardened-WebView / Safari ITP throws apply to reads as well as writes — wrapping only setItem is insufficient if the read sits in a boot effect that gates downstream loads. - Files: tools/agent-playground/src/routes/settings/+page.svelte
…sertions The freshness drift suite logged a warning inside the test, then asserted its own log count — nothing in production emits a freshness warning, so the whole mechanism was decorative. `lastValidatedAt` showing up in PR diffs is the rot surface we actually rely on; the in-test warn was just noise. Also trimmed the round-trip test of assertions the type system already guarantees (capabilities.generation === true, typeof edit === "boolean", size regex, aspectRatio length, format enum). The kept assertions (displayName non-empty + trimmed, lastValidatedAt is a real calendar date) cover what TS can't. ## Progress - Task: #3 Clean up image-capabilities tests: delete decorative freshness suite, trim type-system tautologies - Decisions: Removed `logger` and `vi` imports (orphaned by deleting the freshness suite — no other test used them). Kept the ISO_DATE regex const; the round-trip test still uses it. - Key Learnings: None - Files: packages/llm/src/image-capabilities.test.ts
Adds explicit coverage for branches that feed Settings UI behavior but had no test: - reload failure after writeFile → restartRequired: true (flips the success-flash from "active now" to "restart the daemon") - resolveImageRoleForGet catch-branch → parses the configured chain head when the live resolver can't credential the image model - skipWrite gate (fresh install, no friday.yml) → response succeeds with no file materialized (preserves the migration-banner gate) - skipWrite gate (existing friday.yml, empty payload) → models key dropped, version + workspace preserved - configPath GET assertions — undefined when friday.yml is absent, absolute path when present (gates the playground's migration nudge) ## Progress - Task: #4 Extend PUT /api/config/models test coverage: 4 untested route branches - Decisions: used vi.spyOn over extending createStubPlatformModels — the existing test uses the real createPlatformModels so reload() actually mutates state, and one-test spy overrides are cheaper than widening the shared helper's surface. mock.results[0].type === "throw" is the clean way to verify the catch-branch fired without re-throwing. - Key Learnings: stringifyYaml emits version "1.0" with SINGLE quotes (defensive against YAML 1.1 float coercion) — asserts on the on-disk YAML must accept either quote style. resolveImageRole defers credential checks but pre-flights overlay membership at boot, so you can construct PlatformModels with an uncredentialed image entry as long as the language defaults have a credential and the image entry is in IMAGE_OVERLAY. LITELLM_API_KEY universally satisfies hasCredential — any test that needs an uncredentialed provider must delete it AND not set the provider-specific env var. - Files: apps/atlasd/routes/config.test.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Why
Image generation was the only model layer users couldn't configure. The agent hardcoded
google:gemini-3.1-flash-image-preview— a Google preview ID that can be retired without notice — and ignored capability differences between providers (DALL·E 3 is gen-only, GPT Image 1.5 needs a verified org, Imagen takesaspectRationotsize). Swapping the model meant editing source code; trying to edit an image on a gen-only model produced silent nonsense.What
gen ✓always,edit ✓/edit ✗) and any provider note (e.g. "Requires a verified OpenAI organization." on GPT Image 1.5).-previewID withgoogle:gemini-2.5-flash-image. One-time migration banner for upgrade users; fresh installs see nothing.friday.yml'smodels.imagenow fails daemon boot with the list of known model ids, not at first image-gen attempt.scripts/validate-image-models.tsexercises every overlay entry against live providers (direct + LiteLLM proxy) with a cost preview and y/N gate — operators can confirm what actually works in their deployment.QA path
Spin up
deno task dev:playgroundand walk through:OPENAI_API_KEY, the OpenAI section shows the inline "Save & unlock" banner.friday.yml—models.image: openai:dall-e-3should be written.models.imagefrom Settings ("Use default chain"). Generate an image from chat — output uses Gemini 2.5 Flash Image.image-picker-intro-seenkey fromlocalStorageand reload Settings. Banner appears in the Image row. Dismiss → reload → banner stays gone.Notes
(model × transport)test matrix that skips cleanly when fixtures are absent.