Skip to content

feat(settings): add configurable image-model picker (Image role)#464

Open
LissaGreense wants to merge 30 commits into
mainfrom
feat/image-model-picker
Open

feat(settings): add configurable image-model picker (Image role)#464
LissaGreense wants to merge 30 commits into
mainfrom
feat/image-model-picker

Conversation

@LissaGreense

@LissaGreense LissaGreense commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Why

Image generation was the only model layer users couldn't configure. The agent hardcoded google:gemini-3.1-flash-image-preview — a Google preview ID that can be retired without notice — and ignored capability differences between providers (DALL·E 3 is gen-only, GPT Image 1.5 needs a verified org, Imagen takes aspectRatio not size). Swapping the model meant editing source code; trying to edit an image on a gen-only model produced silent nonsense.

What

  • Image picker in Settings. New "Image" row alongside the four LLM roles, with primary + fallback chain. Each option shows capability badges (gen ✓ always, edit ✓ / edit ✗) and any provider note (e.g. "Requires a verified OpenAI organization." on GPT Image 1.5).
  • Stable default. Replaces the -preview ID with google:gemini-2.5-flash-image. One-time migration banner for upgrade users; fresh installs see nothing.
  • Capability-aware agent. When the chosen model can't satisfy the request (edit prompt on gen-only model), the agent returns a clean error pointing back to Settings instead of producing nonsense.
  • Fail-fast configuration. A typo in friday.yml's models.image now fails daemon boot with the list of known model ids, not at first image-gen attempt.
  • Validation harness. scripts/validate-image-models.ts exercises every overlay entry against live providers (direct + LiteLLM proxy) with a cost preview and y/N gate — operators can confirm what actually works in their deployment.

QA path

Spin up deno task dev:playground and walk through:

  1. Settings → Models shows 5 role cards. Image card defaults to "Gemini 2.5 Flash Image".
  2. Override the default. Click the Image tile → picker opens with 6 entries grouped by provider. Verify badges (gen/edit) render and the GPT Image 1.5 note appears.
  3. Locked provider. If you don't have OPENAI_API_KEY, the OpenAI section shows the inline "Save & unlock" banner.
  4. Persist. Pick DALL·E 3 → Save → reload the page → tile still shows DALL·E 3. Check friday.ymlmodels.image: openai:dall-e-3 should be written.
  5. Capability mismatch. From a chat, attach an image and ask to edit it. With DALL·E 3 selected, you should get a clean error mentioning "supports generation only" and pointing to Settings. Switch to GPT Image 1.5 (or Gemini), retry — the edit should succeed.
  6. Default flow. Clear models.image from Settings ("Use default chain"). Generate an image from chat — output uses Gemini 2.5 Flash Image.
  7. Migration banner. Delete the image-picker-intro-seen key from localStorage and reload Settings. Banner appears in the Image row. Dismiss → reload → banner stays gone.

Notes

  • Validation harness is manual / cost-gated; not wired into CI. Fixtures captured by the harness drive a (model × transport) test matrix that skips cleanly when fixtures are absent.
  • Out of scope (deferred to follow-ups): mask/inpainting, multi-image generation, additional providers, per-workspace defaults overrides, quality evals.

The Settings image picker needs a freshness signal — the set of image
models the gateway/provider currently advertises — to intersect against
the hand-curated capability overlay. CatalogEntry gains an `images:
ModelInfo[]` field populated alongside the existing `models` (language)
field.

Routing rules in groupGatewayModels:
- Gateway entries with `modelType: 'image'` route to the appropriate
  provider's images bucket (catches Imagen under vertex).
- Google (`vertex` + `google/gemini-`) `-image` ids invert the existing
  language filter — they ship as `modelType: 'language'` so the id-based
  rule is the only thing that catches them.
- OpenAI `gpt-image-*` / `dall-e-*` id patterns route to images
  regardless of modelType, defensive against gateway misclassification.

Direct-fetch providers (groq / openrouter / local) keep `images: []` —
they have no image surface. Existing language partitioning is unchanged;
all 13 prior tests pass without modification.

## Progress
- Task: #6 Extend model-catalog with `images: ModelInfo[]` bucket
- Decisions:
  - Belt-and-suspenders on OpenAI: both modelType=image AND id-pattern
    route to images. Gateway has shipped both shapes for dall-e historically;
    pattern match alone catches future regressions.
  - Added a vertex-fallthrough branch for `modelType: 'image'` so Imagen
    ids (no `-image` substring) land in google.images without needing a
    separate `imagen-` prefix check.
  - `images` is non-optional on CatalogEntry — every entry carries an
    empty array rather than `undefined`, so consumers don't branch.
- Key Learnings:
  - Google ships `-image` variants under `modelType: 'language'` in the
    gateway response — the id-substring check is load-bearing for image
    discovery, not a defensive duplicate.
  - The Svelte components duplicate CatalogEntry locally rather than
    importing from `@atlas/llm` — adding fields is non-breaking but
    consuming new fields requires touching each duplicate.
- Files:
  - packages/llm/src/model-catalog.ts
  - packages/llm/src/model-catalog.test.ts
Tracer bullet for the image-model picker — schema → resolver → agent.
Adds `image` as a fifth `PlatformRole` and a sibling `getImage(): ImageModelV3`
on `PlatformModels` so the image-generation agent stops calling
`registry.imageModel("google:gemini-3.1-flash-image-preview")` directly and
defaults to the stable `google:gemini-2.5-flash-image` instead.

Changes:
- `PlatformModelsSchema` (config/atlas.ts) gains `image: ModelIdSchema.optional()`.
- `DEFAULT_PLATFORM_MODELS.image` = `["google:gemini-2.5-flash-image"]`.
- `PlatformModels` interface (llm + mirrored in agent-sdk) gains `getImage()`.
- New `resolveImageRole` private function mirrors `resolveRole`'s chain-walk
  semantics (format-strict, unknown-provider-strict, missing-cred-tolerant
  across multi-entry chains) but constructs via `registry.imageModel(id)`.
- `getImage()` logs `Image model resolved` on success matching the boot
  pattern; throws `PlatformModelsConfigError` when no entry has credentials.
- New `image-capabilities.ts` overlay with one entry for the default model;
  full 6-entry overlay deferred to task #2.
- Agent (image-generation/agent.ts:99) now calls `platformModels.getImage()`.
- `createStubPlatformModels` extended with `getImage()` for downstream tests.

Boot-time overlay validation (`unknown_image_model` ErrorKind) is explicitly
deferred to task #3 — tracer bullet covers the seam only.

## Progress
- Task: #1 — Tracer Bullet: Wire image model through PlatformModels resolver to agent
- Decisions:
  - Forked `resolveImageRole` from `resolveRole` rather than parameterized.
    `resolveRole` threads through `traceModel()` (LanguageModelV3-specific);
    no `traceImageModel` exists. Generic helper at two callers buys less
    than it costs.
  - Used `ImageModelV3` (not the design-doc's "ImageModelV4" — the SDK only
    ships V3). Confirmed with team-lead before committing.
  - Narrowed `get(role)` to `LanguageRole = Exclude<PlatformRole, "image">`
    so the type system enforces the right method per model type. `PlatformRole`
    still includes `"image"` so `DEFAULT_PLATFORM_MODELS` and config-shape
    machinery key uniformly across all five roles.
  - Dropped stale `registry.imageModel` mock in agent.test.ts now that the
    agent no longer touches the registry directly; replaced with a structurally-
    valid `ImageModelV3` stub on the `platformModels` context.
- Key Learnings:
  - `PlatformModels` interface is intentionally duplicated in both
    `packages/llm/src/platform-models.ts` and `packages/agent-sdk/src/types.ts`
    (agent-sdk is a leaf package — no cross-package imports). Comment at
    agent-sdk/types.ts L40 explicitly documents the mirror. Any future
    interface change must update both.
  - The AI SDK exposes `ImageModelV3` (latest); there is no `ImageModelV4`
    despite occasional references in design docs.
  - Combined `deno check` over many files can intermittently emit a spurious
    TS2589 "type instantiation excessively deep" error that disappears on
    rerun or when checking files individually — likely a workspace-resolution
    flake, not a real type problem.
- Files:
  - packages/config/src/atlas.ts — schema
  - packages/llm/src/platform-models.ts — PlatformRole, resolver, getImage()
  - packages/llm/src/image-capabilities.ts — new overlay module
  - packages/llm/mod.ts — re-exports
  - packages/llm/src/test-utils.ts — stub helper extended
  - packages/llm/src/platform-models.test.ts — 2 new getImage() tests
  - packages/agent-sdk/src/types.ts — mirrored interface
  - packages/bundled-agents/src/image-generation/agent.ts — swap to getImage()
  - packages/bundled-agents/src/image-generation/agent.test.ts — drop dead mock
Extends IMAGE_OVERLAY from the tracer bullet's single Gemini entry to
the full v1 set of six verified models — three Google (Gemini 2.5 Flash
Image + Imagen 4 quality/fast) and three OpenAI (GPT Image 1.5, DALL·E
3, DALL·E 2). The capability-check (Task #4) and controlAxis-dispatch
(Task #5) work, plus the validation harness (Task #9), all iterate this
overlay, so locking the full matrix unblocks them.

Defaults rationale:
- Google → `1:1` PNG. Square is Gemini/Imagen's natural output; PNG
  keeps lossless transport for edit flows.
- OpenAI → `1024x1024` PNG. Baseline supported by every entry; avoids
  paying for higher-res tiers the picker doesn't surface.

GPT Image 1.5 carries an explicit `note` about the verified-org
requirement so users hit that gate at config time rather than at first
generation attempt.

New `image-capabilities.test.ts`:
- shape: every entry has non-empty displayName, type-round-trips, ISO
  date parses
- provider→controlAxis split (google=aspectRatio, openai=size) is
  locked in — protects Task #5's dispatch site
- freshness: scans `lastValidatedAt` and fires a non-failing
  `logger.warn` for entries older than 180 days (harness updates these)
- lookup: hit + miss cases

## Progress
- Task: #2 Populate full image-capabilities overlay (6 entries) with round-trip test
- Decisions:
  - Stale check is advisory, not a hard fail. The validation harness
    (Task #9) is the source of truth for freshness; the test just
    surfaces drift so reviewers notice before merging stale data.
  - Asserted the provider→controlAxis split in tests explicitly. Even
    though the discriminated union catches structural mismatches at
    compile time, a future entry could be structurally valid but
    semantically wrong (e.g. an OpenAI entry with `aspectRatio`).
  - Used a `if (entry === null) throw` narrowing in the lookup test
    instead of an `as` assertion (CLAUDE.md: no `as`).
- Key Learnings:
  - `logger.warn(msg, ctx)` is the established shape across the
    codebase (`packages/llm/src/small.ts`, `core/credential-fetcher.ts`).
    Spy on it via `vi.spyOn(logger, "warn")` rather than `console`.
  - Discriminated union on `controlAxis` is genuinely load-bearing —
    TypeScript blocks the wrong-axis param at the IMAGE_OVERLAY literal,
    so a typo at the data layer fails the compile, not the agent.
- Files:
  - packages/llm/src/image-capabilities.ts
  - packages/llm/src/image-capabilities.test.ts
Extend createPlatformModels to pre-flight `models.image` at boot:
each chain entry must format-parse, name a known provider, AND have an
IMAGE_OVERLAY entry. A typo like `google:gemini-2.5-flas-image` now
fails daemon startup with `unknown_image_model` instead of surfacing
on the first image-gen attempt.

Credential checks remain deferred to getImage() — daemons must still
boot when image providers are intentionally unset. Boot errors
aggregate with language-role errors into a single
PlatformModelsConfigError so operators see every problem in one
startup attempt.

The error formatter renders `unknown_image_model` as a new arm
listing the offending id alongside the full set of known overlay
ids, copy-pastable for fixes.

resolveImageRole keeps the overlay check inline as a defensive last
line for hot-reload / test-injection paths that bypass boot.

## Progress
- Task: #3 Add boot-time overlay validation with `unknown_image_model` ErrorKind
- Decisions:
  - Separate `preflightImageRoleAtBoot` instead of folding image into
    the `LanguageRole[]` loop — the language path returns
    LanguageModelV3 via resolveRole, the image path returns
    ImageModelV3, and a generic abstraction across two callers buys
    less than it costs.
  - Boot-time validation does NOT check credentials for image
    (departs from language-role behavior) so unconfigured image
    providers don't block daemon startup — the credential check is
    `getImage()`'s job.
  - Per-chain first-error-wins (same as `resolveRole`); aggregation
    happens across roles via the shared `bootErrors` array.
- Key Learnings:
  - The image role's boot-validation shape intentionally diverges
    from language roles: language defaults assume keys are required,
    image defaults assume the provider may be unconfigured. Future
    "optional" roles should follow this pattern rather than the
    eager-resolve pattern.
- Files: packages/llm/src/platform-models.ts, packages/llm/src/platform-models.test.ts
Resolve the image model once via platformModels.getImage(), look up its
capability entry in the IMAGE_OVERLAY, and short-circuit with a clean
err(string) when the user pairs an edit-mode prompt (artifact UUID
reference) with a generation-only model (DALL·E 3, Imagen 4, Imagen 4
Fast). Without this gate the agent forwarded edit prompts to providers
that cannot honour them and produced nonsense output.

The error names the displayName, states "supports generation only", and
points at Settings → Image — everything a user needs to act. A defensive
null branch surfaces overlay/resolver desync if it ever slips past boot
validation. The single `entry` binding is positioned so task #5 can
reuse it for controlAxis dispatch without re-querying the overlay.

## Progress
- Task: #4 Agent capability check — err() when edit requested on gen-only model
- Decisions:
  - Lookup runs BEFORE source-image binary reads — fail fast, no wasted I/O.
  - Kept the defensive `entry === null` branch even though #3 now boot-validates;
    a logger.error fires so a regression in the resolver/overlay contract is loud.
  - Default test stub modelId switched from "stub-image" (overlay-missing) to
    "google:gemini-2.5-flash-image" (edit-capable) so existing tests pass the
    new capability gate without per-test plumbing; gen-only tests override via
    `stubPlatformModels.getImage.mockReturnValueOnce(...)`.
- Key Learnings:
  - `vi.mock("@atlas/llm", () => ({...}))` is a FULL replacement — adding a
    second consumer of the module (lookupImageEntry) breaks silently with
    "No 'X' export is defined on the mock." Fix: `vi.importActual` and
    re-export the pure helpers, only stub what actually needs faking.
  - `vi.clearAllMocks()` does NOT clear queued `mockReturnValueOnce` — but the
    queue is naturally drained by the single call per test, so `Once` is the
    safest way to vary platformModels.getImage per test without leaking state.
- Files:
  - packages/bundled-agents/src/image-generation/agent.ts
  - packages/bundled-agents/src/image-generation/agent.test.ts
Drop the hardcoded IMAGE_SIZE constant and dispatch on the overlay
entry's `controlAxis` discriminator. Size-axis entries (DALL·E, GPT
Image) now receive `{ size }` and aspectRatio-axis entries (Imagen,
Gemini) receive `{ aspectRatio }` — each provider gets the param
shape it actually expects, no cross-axis bleed.

Tightened `ImageDefaults.aspectRatio` from `string` to
`${number}:${number}` so the overlay value narrows cleanly into the
AI SDK's expected template literal. All existing entries already
match the tighter type ("1:1"); no entry change needed.

The stale generation-complete log field `imageSize: IMAGE_SIZE` is
removed alongside the constant.

## Progress
- Task: #5 Agent controlAxis dispatch — size vs aspectRatio param shape
- Decisions:
  - Computed `controlParams` object spread into generateImage rather than
    duplicating the call across the discriminator branches — keeps the
    happy path single-call and lets the type-level union enforce
    "exactly one axis" at the use site.
  - Tightened the overlay's aspectRatio template literal at the type
    layer rather than asserting at the agent boundary; no `as`, no Zod
    surface area added, and the change strictly narrows what the
    overlay can ever contain.
  - Replaced the stale "passes 1024x1024 size" test with two per-axis
    tests (aspectRatio default via Gemini stub, size via DALL·E
    override). The old test wired-in the previous default and no longer
    reflects how the agent dispatches.
- Key Learnings:
  - Spreading a discriminated union into a function that accepts both
    keys as `?:` optional works in practice but blows TS instantiation
    depth (TS2589) if the inner property types don't *also* line up with
    the SDK's narrow types. Fix at the data layer (tighten the overlay)
    rather than the call site — same end state, no spread-with-cast
    gymnastics.
  - `deno check <file>` may transitively pull in workspace files outside
    the file's import graph; unrelated WIP in `apps/atlasd/routes/config.ts`
    surfaces here even though my agent code is independently clean.
    Spot-check via `deno check` on the leaf file only when in doubt,
    and trust `deno task test` against the test file for behaviour.
- Files:
  - packages/bundled-agents/src/image-generation/agent.ts
  - packages/bundled-agents/src/image-generation/agent.test.ts
  - packages/llm/src/image-capabilities.ts
… capture

scripts/validate-image-models.ts iterates listImageEntries() and exercises
each entry against live providers under both direct and proxy transports.
Captures envelope-only fixtures ({warnings, providerMetadata, mediaType,
base64Length, imageCount}) keyed under {direct, proxy} in a single per-model
JSON file. Failed transports write null (not omitted) so Task #10's
parameterized tests can distinguish "not yet validated" from "validated as
broken."

Dual-transport mechanics: a single process mutates LITELLM_API_KEY and calls
resetRegistry() between passes, restoring original env on exit. Pre-flight
cost preview + y/N gate guards spend; MAX_CALLS=24 is a hard ceiling.

`lastValidatedAt` updates: the script prints a unified diff for manual
application rather than codemodding image-capabilities.ts directly —
source-mutation from a one-shot script fights the formatter and is fragile.

Thumbnails (design v3 step 4) are intentionally not written; a true ≤64×64
thumbnail needs a PNG decoder + resampler, which is too much dependency
surface for a one-shot. The _thumbnails/ dir is still gitignored so a future
operator can drop sanity-check PNGs locally without polluting the repo.

## Progress
- Task: #9 — Validation harness with dual-transport capture
- Decisions:
  - Diff-print over codemod for lastValidatedAt updates (option b per task brief)
  - Plain stdout writes over @atlas/logger (JSON output is hostile to interactive CLI UX)
  - Skip thumbnail writing; documented as a future enhancement requiring a resampler
  - Per-entry fixture write after each transport pass — mid-run crashes preserve partial progress
- Key Learnings:
  - `prompt()` global (DOM lib) reads from TTY only — piped stdin returns null, so the y/N gate aborts safely when run non-interactively without --yes
  - The AI SDK's `aspectRatio: \`${number}:${number}\`` template-literal type can be narrowed without `as` via `z.custom<T>` with a type-guard predicate
  - `resetRegistry()` is required between transport switches — the registry caches provider clients with their apiKey captured at construction
- Files:
  - scripts/validate-image-models.ts (new)
  - .gitignore (+_thumbnails/ entry)
Parameterize agent tests on every overlay entry × {direct, proxy},
reading captured envelopes from `__fixtures__/<provider>__<model>.json`
(produced by the validation harness in scripts/validate-image-models.ts).

Coverage matrix is self-documenting: passed / skipped tests appear in
verbose test output with the reason inline:
  ✓ google:gemini-2.5-flash-image · direct · dispatches aspectRatio param
  ↓ google:gemini-2.5-flash-image · proxy · skipped (proxy fixture null)
  ↓ openai:dall-e-2 · direct · skipped (no fixture file)
  ✓ openai:dall-e-3 · direct · returns err on edit prompt (gen-only)

Edit-capable entries run a gen-mode prompt and assert controlAxis
param shape + success (exercises #5). Gen-only entries run an edit
prompt and assert the capability `err()` (exercises #4); generateImage
is never called for those, so controlAxis is covered exclusively via
edit-capable entries within the matrix.

When the fixture file is absent OR the transport key is `null`, the
pair emits `test.skip` with the reason in the title. The matrix never
fails the suite for missing fixtures — operators fill in coverage by
re-running the harness over time.

The `@atlas/llm` vi.mock spreads `...actual` instead of enumerating
exports, so future overlay helpers don't need a mock-update PR. Only
`smallLLM` remains overridden.

## Progress
- Task: #10 Fixture-driven `(model × transport)` parameterized agent tests
- Decisions:
  - Manual `for` loop over `listImageEntries() × TRANSPORTS` instead of
    `describe.each` — gives explicit per-pair control of the test
    variant (dispatch vs capability err) and keeps skip reasons in
    titles rather than table fragments.
  - Fixture loader is sync (existsSync + readFileSync) because vitest
    needs the skip-vs-run decision at describe-time, before any
    `beforeEach` runs.
  - Synthesize an 8-byte PNG signature for the generateImage mock
    response — the agent only copies bytes through to ArtifactStorage,
    so signature is sufficient; matching the envelope's base64Length
    exactly would buy nothing.
  - FixtureSchema mirrors the harness's EnvelopeSchema exactly
    (warnings/providerMetadata/mediaType/base64Length/imageCount).
    `providerMetadata` is `z.unknown()` not `z.record(...)` to match
    the harness — providers return wildly different metadata shapes.
- Key Learnings:
  - When a `vi.mock(...)` factory passes most of a module through
    (`...actual`), enumerate explicit exports only when you want to
    block re-exports — spreading is the right default and survives
    future module additions without test churn.
  - Test-time fixture I/O at describe-time is the right place for
    the file-exists check: vitest re-runs the describe function on
    every invocation, so adding a fixture file later picks up
    automatically without code change.
- Files: packages/bundled-agents/src/image-generation/agent.test.ts
…verlay picker

Surfaces the image-model layer through the Settings UI so users can configure
a primary→fallback chain without hand-editing friday.yml. The Image row reuses
the existing `<ModelChain>` component but routes its slot-edit clicks to a new
`<ImageModelPicker>` that renders the static `listImageEntries()` overlay
grouped by provider, each with `gen ✓` / `edit ✓` / `edit ✗` badges and the
optional `note` field (e.g. "Requires a verified OpenAI organization." on
gpt-image-1.5). Locked-provider banner reuses the same inline "Save & unlock"
flow the language picker provides.

Daemon-side wiring: `apps/atlasd/routes/config.ts` adds `"image"` to
`PLATFORM_ROLES`, accepts `image: roleValueSchema.optional()` on PUT, and
documents `images: ModelInfo[]` on the catalog response schema (already on the
wire from task #6). The GET handler special-cases image — `getImage()` can
throw post-boot when no credentials are present (boot only validates format +
overlay membership), so the route falls back to the user's configured chain
head (or `DEFAULT_PLATFORM_MODELS.image[0]`) so the tile renders the would-be
model and the locked-banner UX carries the missing-credential signal.

Browser-safe import: `@atlas/llm`'s mod.ts transitively pulls `node:path`
through `@atlas/utils/paths`, so importing `listImageEntries` from the bare
package crashed the SvelteKit bundle. Added an `./image-capabilities` subpath
export (same pattern as `./pricing`) and updated both call sites to use it.

`save-api-key.ts`'s local `CatalogEntry` duplicate gained the `images` field
since the splice now flows the post-unlock catalog through the typed UI state
in `+page.svelte` (which now requires `images`). The other duplicate
(`model-pill.ts`) is untouched — chat consumers only read `entry.models`.

## Progress
- Task: #7 Settings UI — Image row with capability badges and overlay intersection
- Decisions:
  - "Intersection" rule resolves to "render all overlay entries" — overlay
    membership is the gate, gateway listing is a freshness signal. AC's
    contradictory wording is resolved by the design doc's "overlay entries
    the gateway doesn't surface are still shown."
  - New `ImageModelPicker.svelte` rather than conditional-laden `ModelPicker`
    — overlay-driven entries + badges + notes diverge enough that branching
    inside the language picker would bury both.
  - `imageCatalog` derived view (overlay entries grouped by provider, mapped
    onto each catalog entry's `models` field) lets `<ModelChain>` render
    friendly displayNames for image picks without touching ModelChain.
  - Daemon GET catches `getImage()` throws and falls back to configured head
    or default head — agent path still wants the throw (loud failure at gen
    time), only settings-display wants the tolerant fallback. No new
    `tryGetImage()` accessor to pollute the agent-sdk mirror.
- Key Learnings:
  - `@atlas/llm`'s main entry pulls server modules transitively — anything
    Svelte-side needs a narrower subpath export. The existing `./pricing`
    entry is precedent for this pattern; `image-capabilities` is the second.
    Future browser-side imports from `@atlas/llm` should add their own
    subpath rather than depending on the main entry being browser-safe.
  - Vite caches `package.json` `exports` at startup — adding a new subpath
    export requires bouncing the dev server to take effect.
  - `resolver()` from `hono-openapi` only emits OpenAPI docs, not runtime
    parsing — `catalogResponseSchema` missing the `images` field never
    silently stripped data; it just left the OpenAPI doc out of sync.
  - The daemon already serializes `images` on `/api/config/models/catalog`
    via `c.json(catalog.entries)` because the underlying `CatalogEntry` from
    `@atlas/llm` (task #6) carries it — the wire was already correct, only
    types were lagging.
  - `getImage()` throws a `PlatformModelsConfigError` (not just an Error)
    when no chain entry has credentials, even though boot validation
    succeeded — credential checks are deferred to call time by design.
- Files:
  - apps/atlasd/routes/config.ts
  - packages/llm/deno.json
  - packages/llm/package.json
  - tools/agent-playground/src/lib/components/settings/image-model-picker.svelte (new)
  - tools/agent-playground/src/lib/save-api-key.test.ts
  - tools/agent-playground/src/lib/save-api-key.ts
  - tools/agent-playground/src/lib/server/lib/context.ts
  - tools/agent-playground/src/routes/settings/+page.svelte
Adds a dismissible banner inside the Settings Image row that softens the
default-model swap (gemini-3.1-flash-image-preview → gemini-2.5-flash-image)
for upgrade users. Banner renders only when all three hold: friday.yml exists,
`models.image` has no override, and `localStorage["image-picker-intro-seen"]`
is not "true". Dismiss writes the localStorage key so the banner never
returns. Fresh installs see nothing — there's no prior aesthetic to compare
against.

To detect fresh-install vs upgrade without server state, `GET
/api/config/models` now sets the optional `configPath` field only when
friday.yml is actually on disk (previously always set). The playground reads
presence-of-`configPath` as the upgrade signal. Schema is already
`.optional()` and no other consumer reads the field, so the change is
backward-compatible.

Implementation notes:
- `imagePickerIntroDismissed` is read once on mount inside `$effect`
  (guarded against SSR) and starts `true` so the banner can't flash before
  the localStorage read settles.
- `imageRoleUnset` is `$derived` from the loaded `models` array, not from
  the in-flight `chains` edit state — the banner reflects persisted server
  config, not unsaved picks.
- A small `.role-body` wrapper stacks the banner above `<ModelChain>`
  inside the Image role-card's right grid column without disturbing the
  language roles or the `200px 1fr` layout.

## Progress
- Task: #8 Settings UI — one-time migration nudge banner for image default change
- Decisions:
  - Reuse `configPath` as the upgrade signal instead of introducing a new
    `configExists` boolean. The schema is already optional, the field had
    no consumers, and "configPath present iff file exists" is the
    expectation the design-doc settled on.
  - Read `models[role].configured === null` from the loaded state, not
    from `chains[role].length === 0`. The latter would hide the banner
    the instant the user opens the picker, before any save.
  - Plain `try/catch` around the `localStorage.setItem` write so private
    mode / quota-full doesn't take down the dismiss UX — the in-memory
    flag still hides the banner for the session even if the persist fails.
  - No animation, no "remind me later", no toast. Banner shows once,
    dismiss button hides it forever. Matches the task's "exactly what's
    asked" guidance.
- Key Learnings:
  - `GET /api/config/models` had `configPath` declared as `.optional()` in
    the response schema but always populated it — making it conditional
    matches the schema's already-permissive intent and gives the
    playground a clean upgrade-detection signal without a new field.
  - The dev-watcher only respawns the daemon when the old child exits; if
    a stale daemon from an earlier session is holding port 29080, the new
    watcher's spawn fails with "address already in use" and your code
    edits never take effect. Kill the running daemon PID and the watcher
    will respawn it with the new source.
  - Svelte 5 `$effect` doesn't run on the server, but the script body
    does — anything that touches `localStorage` at top level still needs
    a `typeof` guard for SvelteKit SSR.
- Files:
  - apps/atlasd/routes/config.ts
  - tools/agent-playground/src/routes/settings/+page.svelte
Use `"configPath" in data` to narrow before reading, instead of an `as`
assertion. Matches the surrounding `in`-narrow pattern in `loadModels` and
keeps the file `as`-free per project convention.

## Progress
- Task: #8 Settings UI — one-time migration nudge banner for image default change
- Decisions: lead-requested polish on top of c13eecc; no behavior change.
- Files: tools/agent-playground/src/routes/settings/+page.svelte
Per-task commits passed `deno check` but didn't run `deno task fmt`. Biome
collapses short multi-line object literals and re-sorts `mod.ts` re-exports
alphabetically. No behavior change.
@LissaGreense LissaGreense requested a review from ljagiello as a code owner June 2, 2026 18:14
PR #464 added `getImage(): ImageModelV3` to the `PlatformModels`
interface but missed several inline stub implementations across
the workspace. The canonical `createStubPlatformModels` helper
was updated, but stub literals embedded in test files and the
evals harness still satisfied the old shape — workspace-wide
`deno check` failed with 9 errors.

Mirror each file's existing stub idiom (vi.fn for vitest sites,
throw-on-invoke for non-vitest contexts). No test invokes
getImage directly, so throw-on-invoke is sufficient everywhere.

Files touched:
- apps/atlasd/routes/workspaces/draft.test.ts (4 sites)
- apps/atlasd/src/chat-skill-discovery.test.ts
- packages/bundled-agents/src/web/tools/search.test.ts
- packages/workspace/src/__tests__/context-memory-mounts.test.ts
- tools/evals/lib/context.ts
@LissaGreense

Copy link
Copy Markdown
Contributor Author
image

…odelId

QA case 4 surfaced that every image generation fails with "unknown to
Friday's capability overlay" — including the default model on a fresh
setup. Root cause: the agent looked up `IMAGE_OVERLAY` by `model.modelId`
(bare id like `gemini-2.5-flash-image`), but the overlay is keyed by
`provider:model` (e.g. `google:gemini-2.5-flash-image`). The SDK
`ImageModelV3` also reports `provider` as the transport string
(`google.generative-ai`), so concatenating wouldn't have helped either.

Extend `PlatformModels` with `getImageOverlayKey(): string`, which returns
the resolved `provider:model` spec the resolver used to build the model.
The resolver already had this string — it just dropped it on the floor
returning only the SDK model. Refactor `resolveImageRole` to return
`{ key, model }` so `getImage()` and `getImageOverlayKey()` share one
resolution pass and can't diverge if process.env flips between calls.

The agent now calls `platformModels.getImageOverlayKey()` for the overlay
lookup. The error log/message also switches to the overlay key (more
useful for debugging — includes the provider half).

Regression-guard: the agent test's stub `ImageModelV3` now carries
SDK-shaped values (provider="google.generative-ai", modelId="gemini-2.5-flash-image"
— neither is an overlay key). If the agent ever regresses to looking up
by `model.modelId`, the overlay lookup fails and tests catch it. Added an
explicit assertion test that the agent calls `getImageOverlayKey()`.

Synced the duplicate `PlatformModels` interface in `@atlas/agent-sdk`
(SDK leaf can't import from `@atlas/llm`). Added `getImageOverlayKey`
stubs to every test fixture listed in the AC.

## Progress
- Task: #1 Fix: image-agent overlay lookup uses wrong key
- Decisions: Refactored resolver to return {key,model} for atomic resolution
  (avoids theoretical race when process.env flips between getImage and
  getImageOverlayKey). Stub model uses SDK-shaped values rather than
  embedding the overlay key in modelId — turns the test into a regression
  guard for this specific bug class.
- Key Learnings: SDK ImageModelV3.provider is the transport string
  ("google.generative-ai"), not the overlay/registry provider ("google").
  Any capability lookup keyed on overlay shape must go through the configured
  spec, not the SDK model's reported fields. PlatformModels is duplicated
  in @atlas/agent-sdk (SDK is a leaf, can't import @atlas/llm) — interface
  changes must land in both.
- Files: packages/llm/src/platform-models.ts, packages/llm/src/test-utils.ts,
  packages/agent-sdk/src/types.ts,
  packages/bundled-agents/src/image-generation/{agent.ts,agent.test.ts},
  apps/atlasd/routes/workspaces/draft.test.ts,
  apps/atlasd/src/chat-skill-discovery.test.ts,
  packages/bundled-agents/src/web/tools/search.test.ts,
  packages/workspace/src/__tests__/context-memory-mounts.test.ts,
  tools/evals/lib/context.ts
The Settings UI's PUT /api/config/models wrote the new models block to
friday.yml but only constructed a throwaway PlatformModels for validation;
the daemon's live instance kept serving the boot-time config until restart.
Every model role (conversational, labels, classifier, planner, image)
was affected.

Make PlatformModels itself reloadable in place: the closure-captured
userConfig moves into a mutable cell, and a new `reload(input)`
re-runs the same boot-time validation before swapping. Long-lived
references held by WorkspaceRuntime, MCP servers, and route handlers
stay valid — every `get()`/`getImage()` call reads through the cell
and picks up the new config. The PUT handler now calls `reload`
after writing friday.yml; the response's `restartRequired` flag
flips to false when the swap succeeds.

Adversarial: `reload` validates BEFORE mutating, so a bad config
throws without corrupting in-memory state. Pre-flight in the route
still runs first, so disk-write only happens on a config the daemon
will also accept.

## Progress
- Task: #2 Fix: PUT /api/config/models doesn't reload live PlatformModels
- Decisions:
  - Option B (in-place reload) over Option A (daemon setter + indirection)
    because WorkspaceRuntime captures the reference at construction —
    a setter that swaps `daemon.platformModels` would not reach those
    long-lived holders. Mutating the closed-over cell is invisible to
    consumers and requires zero plumbing changes.
  - Mirrored `reload` into `@atlas/agent-sdk`'s PlatformModels interface
    (typed as `unknown` input there — SDK is a leaf, can't import
    PlatformModelsInput). Keeps cross-package assignability without
    breaking the leaf-node invariant.
- Key Learnings:
  - The Hono context already exposes `platformModels` via a lazy getter
    (atlas-daemon.ts:429), so routes read through `daemon.platformModels`
    on each request — perfect substrate for in-place reload.
  - WorkspaceRuntime captures `options.platformModels` once at
    construction (runtime.ts:1757) — a daemon-level setter that swaps
    the reference would NOT reach already-running workspaces. Any future
    "swap the instance" approach has to also reconstruct or notify
    every WorkspaceRuntime.
  - `daemon-startup.test.ts` reads the dev machine's real friday.yml
    (no FRIDAY_CONFIG_PATH override) — 5 tests fail with
    "local:google/gemma-4-e4b missing creds" purely due to local-env
    contamination, unrelated to this change. Pre-existing.
- Files:
  - packages/llm/src/platform-models.ts: reload method, mutable userConfig cell
  - packages/llm/src/platform-models.test.ts: reload unit tests
  - packages/llm/src/test-utils.ts: stub reload no-op
  - packages/agent-sdk/src/types.ts: mirror reload in SDK interface
  - apps/atlasd/routes/config.ts: PUT handler calls reload, docstring updated
  - test stubs: draft, chat-skill-discovery, web/tools/search, context-memory-mounts
Biome's auto-format prefers multi-line object literals here. No logic
changes — just whitespace.
Two hand-rolled `PlatformModels` stubs were missed when `getImageOverlayKey`
(447e52b) and `reload` (6d06e01) landed on the interface, breaking CI
type-check on #464. Add throw-on-invoke methods matching each file's local
idiom — playground points callers at `PlaygroundContextOpts.platformModels`,
evals points at "once evals adopt the platform model path".

## Progress
- Task: #3 Add getImageOverlayKey + reload to two missing PlatformModels stubs
- Files:
  - tools/agent-playground/src/lib/server/lib/context.ts: getImageOverlayKey + reload stubs
  - tools/evals/lib/context.ts: reload stub (other two already present)
The Settings page hardcoded "Restart the daemon to apply." and ignored
the PUT /api/config/models response body. Since the daemon now
hot-reloads `PlatformModels` and returns `{ restartRequired: false }`,
users were told to restart needlessly. Parse the response with Zod and
branch the flash text: "Saved. Active now." on hot-reload,
restart-required message only when the daemon couldn't hot-swap. Also
update the stale header doc-comment.

## Progress
- Task: #4 Fix Settings flash + comment to honor restartRequired
- Decisions: Used Zod safeParse to match the file's existing schema
  pattern (TunnelStatusSchema, etc.) and avoid `as` assertions. Reused
  the body already read at line 505 to keep the change surgical.
- Key Learnings: agent-playground svelte files are excluded from
  \`deno check\` — use \`npx svelte-check --tsconfig ./tsconfig.json\` from
  the package root (run via the \`check\` npm script).
- Files: tools/agent-playground/src/routes/settings/+page.svelte
PUT /api/config/models previously wrote `friday.yml` (just `version` +
`workspace` blocks) even when the request's `newModels` map was empty
and the file didn't exist. This promoted fresh-install users into the
"upgrader" state, causing the migration banner in Settings to appear
spuriously on next session — the banner gates on file presence.

Wrap the existing write/reload block in `if (!skipWrite)` where
`skipWrite = empty newModels && !file exists`. Falls through to the
existing `success: true, restartRequired: false` exit. No new return
arms, so Hono's RPC return-type inference stays within TS's depth
budget.

All other paths (file exists with models, partial state, first explicit
role) write normally, per the plan's "absent = fresh, present = upgrade"
definition.

## Progress
- Task: #7 — Skip friday.yml write when PUT /api/config/models has empty models AND file doesn't exist
- Decisions: Restructured as guard around the write block rather than an early-return short-circuit. Adding a sixth `c.json()` site tripped TS2589 "Type instantiation is excessively deep" on Hono's inferred response union — the existing handler already has 5 arms. Falling through to the same success exit avoids the issue without adding a helper or annotation.
- Key Learnings: Hono's RPC return-type inference unions every `c.json()` call-site verbatim — even structurally identical ones aren't deduplicated. Adding "just one more" return arm to a handler with several existing arms can blow past TS's instantiation-depth limit (TS2589). Mitigation: reuse exit points (fall-through over early-return) when the existing handler is already near the limit. Documented in `apps/atlasd/CLAUDE.md` as the broader Hono RPC inference gotcha.
- Files: apps/atlasd/routes/config.ts
Adds two HTTP-level tests for the new PUT /api/config/models handler:
one valid-body round-trip proving disk write + hot-reload + GET reflect
the new config, and one Zod-failing body proving 4xx + no disk write +
live PlatformModels unchanged. Closes the only HTTP-level coverage gap
on the new route — prior coverage in platform-models.test.ts hit only
createPlatformModels/reload, not the route's validation/write/reload
wiring.

## Progress
- Task: #8 Add HTTP tests for PUT /api/config/models
- Decisions: Use real createPlatformModels (LITELLM_API_KEY satisfies
  every provider) so reload() actually swaps state and the GET resolved
  field proves the hot-reload took effect — not just that the disk write
  happened. Follows daemon.test.ts's partial-AppContext stub pattern
  ({ platformModels } as unknown as AppContext) for context wiring.
  Picked groq for the labels happy-path role so resolved.provider
  observably changes from the anthropic default — a silent-reload
  regression would flip this assertion red.
- Key Learnings: hono-openapi validator returns 400 (not 422) on Zod
  failure — confirmed against existing newline-bearing-values test.
  The PUT /models route reads friday.yml from FRIDAY_CONFIG_PATH (not
  FRIDAY_HOME), so tests must pin that env var separately. LITELLM_API_KEY
  is the cleanest credential hatch for resolver tests because
  hasCredential() short-circuits on it for every provider.
- Files: apps/atlasd/routes/config.test.ts
…solved

The two image methods each invoked `resolveImageRole()` from scratch — full
chain walk, credential probe, registry construction — so the image agent's
back-to-back calls doubled the work per request. The internal `ResolvedImage`
type was explicitly created to keep the `{ key, model }` pair bonded (see the
comment block at platform-models.ts:314-318); the split API contradicted the
invariant and invited future callers to observe divergence if `process.env`
flipped between lookups.

Collapse both methods into a single `getImageResolved(): { key, model }` that
runs the resolver once and returns the pair atomically. The image agent
destructures `{ key: overlayKey, model }`, halving the per-call resolution
cost.

Mirror the change in `packages/agent-sdk/src/types.ts` PlatformModels
interface. Reload() left as `reload(input: unknown)` — Task #6 removes it
from the SDK separately.

Update every stub and call site:
- packages/llm/src/test-utils.ts (createStubPlatformModels helper)
- packages/bundled-agents/src/image-generation/agent.test.ts (stub + helper)
- packages/workspace/src/__tests__/context-memory-mounts.test.ts
- packages/bundled-agents/src/web/tools/search.test.ts
- apps/atlasd/src/chat-skill-discovery.test.ts
- apps/atlasd/routes/workspaces/draft.test.ts (4 instances)
- tools/agent-playground/src/lib/server/lib/context.ts
- tools/evals/lib/context.ts

## Progress
- Task: #5 — Collapse getImage + getImageOverlayKey into a single getImageResolved on PlatformModels
- Decisions: Returned the existing `ResolvedImage` type alias from the implementation rather than spelling out `{ key, model }` inline — the type was already exported-shape-equivalent and its doc-comment documents the bonding invariant. Kept the doc-comment on the interface terse: the SDK transport-string vs overlay-key distinction is the only thing callers really need to remember.
- Key Learnings: When two methods on an interface MUST be invoked together to stay consistent (same closure state, same env snapshot), collapsing them into one that returns the pair atomically is mechanically a better fit than relying on call-order discipline — even if no current caller misuses the split. The pre-existing internal `ResolvedImage = { key, model }` type and its bonding-invariant comment were a leading indicator that the API surface was wrong.
- Files: packages/llm/src/platform-models.ts, packages/llm/src/platform-models.test.ts, packages/llm/src/test-utils.ts, packages/agent-sdk/src/types.ts, packages/bundled-agents/src/image-generation/agent.ts, packages/bundled-agents/src/image-generation/agent.test.ts, packages/bundled-agents/src/web/tools/search.test.ts, packages/workspace/src/__tests__/context-memory-mounts.test.ts, apps/atlasd/src/chat-skill-discovery.test.ts, apps/atlasd/routes/workspaces/draft.test.ts, tools/agent-playground/src/lib/server/lib/context.ts, tools/evals/lib/context.ts
The `(model × transport)` matrix ran against fixture JSON files that
aren't committed (produced by operator-run `validate-image-models.ts`).
In CI every cell was emitting `test.skip(...)` — 12 skip lines that
asserted nothing while the comment claimed the matrix was the "audit
trail for coverage." Gate the matrix on `fixturesExist`: when no
fixture files are present, register exactly one skip and bail; when
fixtures land locally (or in a future commit), the full matrix runs.
Update the comment to reflect the actual behavior.

## Progress
- Task: #9 Collapse skipped fixture matrix to skipIf in agent.test.ts
- Decisions: Picked Option B (skipIf-equivalent) over Option A
  (commit one fixture per provider) — Option A needs OPENAI / GEMINI
  API keys + ~1-2 hours running the cost-gated harness; B is the
  ~30-min path the team-lead recommended. Used a plain `if (!fixturesExist) { test.skip(...); return; }` rather than `describe.skipIf` because the latter still registers every inner `test.skip` from the `for` loop — yields N skip lines, not one.
- Key Learnings: vitest's `describe.skipIf(true)` does NOT prevent the
  describe callback from running. It executes the body, registers every
  inner `test()` / `test.skip()`, and marks them all as skipped — so a
  matrix wrapped in `describe.skipIf` still emits N skip lines, not
  one. To get a single skip line, gate the registration loop itself
  with a plain `if`.
- Files: packages/bundled-agents/src/image-generation/agent.test.ts
…er types

`packages/agent-sdk/src/types.ts` declared `reload(input: unknown): void` on
the `PlatformModels` interface purely "to mirror @atlas/llm" — a misreading
of structural subtyping. Agents never call `reload`; the only real caller
is `apps/atlasd/routes/config.ts` (PUT /api/config/models), which operates
on the richer `@atlas/llm` `PlatformModels` and is unaffected.

This commit:
- Removes `reload` from the agent-sdk `PlatformModels` interface and trims
  the matching doc-comment paragraph (keeps the `getImageResolved` para).
- Deletes the dead 3-line throwing `reload` stubs from three test files:
  `chat-skill-discovery.test.ts`, `search.test.ts`,
  `context-memory-mounts.test.ts`.
- Narrows seven downstream parameter sites from `@atlas/llm`'s rich
  `PlatformModels` to `@atlas/agent-sdk`'s narrow one. None of these
  callers invoke `reload`; they only need `get` / `getImageResolved`.
  Sites narrowed:
    - `packages/llm/src/small.ts` — `smallLLM` params
    - `packages/core/src/delegate/index.ts` — `DelegateDeps`
    - `packages/bundled-agents/src/web/tools/search.ts` — `SearchToolContext`
    - `packages/bundled-agents/src/web/tools/search-execution.ts` — `executeSearch`
    - `packages/bundled-agents/src/claude-code/agent.ts` — `generateProgress`
    - `packages/system/agents/workspace-chat/workspace-chat.agent.ts` — `generateChatTitle`
    - `apps/atlasd/src/chat-skill-discovery.ts` — two inline `import()` sites

`@atlas/llm`'s richer `PlatformModels` (which keeps `reload`) still flows
into agent-sdk-typed positions via structural subtyping. The lone reload
caller in `apps/atlasd/routes/config.ts` continues to type-check against
the rich interface.

## Progress
- Task: #6 — Drop reload() from agent-sdk PlatformModels interface and delete dead stubs
- Decisions: Expanded scope by 7 files beyond the originally-scoped 4. Team-lead's research found `reload` *call* sites (just one) but missed *declaration* sites — places that typed `platformModels: PlatformModels` from `@atlas/llm` without ever calling reload. Removing reload from agent-sdk broke narrow→rich assignment at every agent that forwards `ctx.platformModels` into those slots. Pinged team-lead with the discovery, got approval, narrowed each consumer to the SDK type. Also dropped my own "no reload" explanatory comment paragraph to honor the "zero hits" verification gate — the deleted method is self-documenting in the diff.
- Key Learnings: When narrowing a shared interface, "find all callers of method X" is insufficient — search for declaration sites that USE the interface as a parameter type too. `grep "PlatformModels"` (the interface name) catches both. Narrow→rich assignment is the failure direction: rich → narrow is free under structural subtyping, but a value of the narrow type cannot satisfy a slot typed as rich (TS rejects: missing required member). Deno workspace cross-package imports resolve automatically via workspace member names — no `deno.json` `imports` entry needed when a workspace package imports from another workspace package.
- Files: packages/agent-sdk/src/types.ts (+ 3 test files for stub removal) + 7 consumer narrowings = 11 files total
…hared helper

Routes every hand-rolled PlatformModels stub through createStubPlatformModels
so the next interface change ripples through one file instead of five. To
preserve each test's existing assertion behavior (vi.fn spying in agent.test,
"should not be called" throws in the other three), extends the helper to
accept wholesale `get` / `getImageResolved` overrides — picks up the team
lead's suggested pattern verbatim. Per-role model overrides still work for
tests that need real LanguageModelV3 behavior; the wholesale overrides
take precedence when supplied.

The two consumers that wholesale-mock @atlas/llm (search.test, chat-skill-
discovery.test) now thread importOriginal so the helper survives the mock,
and search.test's pre-existing top-level vi.fn() / vi.mock factory pairing
gets the canonical vi.hoisted() fix (vitest's hoisting check fired loudly
once the @atlas/llm mock went async).

## Progress
- Task: #10 Migrate hand-rolled PlatformModels stubs to createStubPlatformModels
- Decisions: Extended the helper's override surface (added `get`/`getImageResolved`)
  rather than working around it from the call sites — the override is the
  documented pattern the team lead's example implied, and putting it in the
  helper means future "throws if called" guards land in one line instead of
  five. Migrated context-memory-mounts to drop its `satisfies PlatformModels`
  altogether: the agent-sdk PlatformModels type is now structurally satisfied
  by the helper's @atlas/llm PlatformModels return (extra `reload` is a
  superset). For agent.test, kept vi.fn() spy handles in module-scope vars
  (`getSpy`, `getImageResolvedSpy`) so `mockReturnValueOnce` /
  `toHaveBeenCalled` still target the spies — passed those into the helper as
  the wholesale overrides.
- Key Learnings: `vi.mock("@atlas/llm", () => ({...}))` (wholesale form)
  silently drops every re-export the test file consumes from the same module
  — including helpers imported at module scope before the mock applies. Use
  `(importOriginal) => ({ ...await importOriginal(), ...stubs })` whenever
  the file imports anything else from the mocked module. Separately:
  introducing an async vi.mock factory in a file that already has non-async
  `vi.mock` factories referencing top-level `const mockX = vi.fn()` vars
  exposed Vitest's hoisting rule — those vars need `vi.hoisted()`. The
  original test files happened to work because Vitest's hoisting was lenient
  on sync-only factory graphs; adding an async factory tightens it.
- Files: packages/llm/src/test-utils.ts (helper extended),
  apps/atlasd/src/chat-skill-discovery.test.ts,
  packages/bundled-agents/src/image-generation/agent.test.ts,
  packages/bundled-agents/src/web/tools/search.test.ts,
  packages/workspace/src/__tests__/context-memory-mounts.test.ts
After the team's 7-task PR-review punch list landed, `deno task lint`
auto-fixed line-length and object-literal formatting in 5 files:

- apps/atlasd/routes/config.test.ts
- apps/atlasd/routes/config.ts
- apps/atlasd/routes/workspaces/draft.test.ts
- apps/atlasd/src/chat-skill-discovery.test.ts
- packages/bundled-agents/src/image-generation/agent.test.ts

Pure formatting; no semantic changes.
…rlay

The image-generation agent's defensive null-branch on the overlay lookup
trusts boot-time pre-flight — but pre-flight skips `models.image` when
the user hasn't pinned anything, so the guarantee depends entirely on
every `DEFAULT_PLATFORM_MODELS.image` entry being keyed in `IMAGE_OVERLAY`.
This invariant test fails CI if a future change adds a non-overlay default.

## Progress
- Task: #2 Add invariant test asserting DEFAULT_PLATFORM_MODELS.image entries exist in IMAGE_OVERLAY
- Decisions: Placed in image-capabilities.test.ts (where overlay tests live) rather than platform-models.test.ts — the invariant is a coupling check and reads more naturally next to other overlay assertions. Used filter+toEqual([]) over .every() so the failure message names the offending ids.
- Key Learnings: None
- Files: packages/llm/src/image-capabilities.test.ts
The boot $effect reads `image-picker-intro-seen` to suppress a one-time
migration banner. Hardened-WebView and certain Safari ITP contexts throw
on `localStorage` access entirely, not just on writes. An unguarded read
aborts the effect before the async load chain (loadEnv → loadModels →
loadCatalog → loadTunnel → refreshFromServer) runs, bricking the
Settings page over a cosmetic banner.

Wrap the read in try/catch matching the existing setItem guard in
dismissImagePickerIntro. On failure the default `imagePickerIntroDismissed
= true` keeps the banner hidden so the user isn't presented with an
undismissable nudge.

## Progress
- Task: #1 — Guard localStorage.getItem against access exceptions in Settings boot effect
- Decisions: try/catch nested inside the existing `typeof localStorage` SSR guard, matching the symmetric setItem pattern at line 664. Catch is empty — default state already suppresses the banner; no logging needed for a cosmetic gate.
- Key Learnings: Hardened-WebView / Safari ITP throws apply to reads as well as writes — wrapping only setItem is insufficient if the read sits in a boot effect that gates downstream loads.
- Files: tools/agent-playground/src/routes/settings/+page.svelte
…sertions

The freshness drift suite logged a warning inside the test, then asserted
its own log count — nothing in production emits a freshness warning, so
the whole mechanism was decorative. `lastValidatedAt` showing up in PR
diffs is the rot surface we actually rely on; the in-test warn was just
noise.

Also trimmed the round-trip test of assertions the type system already
guarantees (capabilities.generation === true, typeof edit === "boolean",
size regex, aspectRatio length, format enum). The kept assertions
(displayName non-empty + trimmed, lastValidatedAt is a real calendar
date) cover what TS can't.

## Progress
- Task: #3 Clean up image-capabilities tests: delete decorative freshness suite, trim type-system tautologies
- Decisions: Removed `logger` and `vi` imports (orphaned by deleting the freshness suite — no other test used them). Kept the ISO_DATE regex const; the round-trip test still uses it.
- Key Learnings: None
- Files: packages/llm/src/image-capabilities.test.ts
Adds explicit coverage for branches that feed Settings UI behavior
but had no test:

- reload failure after writeFile → restartRequired: true (flips the
  success-flash from "active now" to "restart the daemon")
- resolveImageRoleForGet catch-branch → parses the configured chain
  head when the live resolver can't credential the image model
- skipWrite gate (fresh install, no friday.yml) → response succeeds
  with no file materialized (preserves the migration-banner gate)
- skipWrite gate (existing friday.yml, empty payload) → models key
  dropped, version + workspace preserved
- configPath GET assertions — undefined when friday.yml is absent,
  absolute path when present (gates the playground's migration nudge)

## Progress
- Task: #4 Extend PUT /api/config/models test coverage: 4 untested route branches
- Decisions: used vi.spyOn over extending createStubPlatformModels — the
  existing test uses the real createPlatformModels so reload() actually
  mutates state, and one-test spy overrides are cheaper than widening the
  shared helper's surface. mock.results[0].type === "throw" is the clean
  way to verify the catch-branch fired without re-throwing.
- Key Learnings: stringifyYaml emits version "1.0" with SINGLE quotes
  (defensive against YAML 1.1 float coercion) — asserts on the on-disk
  YAML must accept either quote style. resolveImageRole defers credential
  checks but pre-flights overlay membership at boot, so you can construct
  PlatformModels with an uncredentialed image entry as long as the
  language defaults have a credential and the image entry is in
  IMAGE_OVERLAY. LITELLM_API_KEY universally satisfies hasCredential — any
  test that needs an uncredentialed provider must delete it AND not
  set the provider-specific env var.
- Files: apps/atlasd/routes/config.test.ts
Reformats a multi-line array literal in the PUT /api/config/models
fallback test landed in 4158de3 — same pattern as 4a65a7c.

## Progress
- Task: team-lead post-wave lint
- Decisions: none — pure formatting, no behavioral change
- Files: apps/atlasd/routes/config.test.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant