Skip to content

fix(model): default synthetic models to text-only input to avoid image 400s#252

Open
craigamcw wants to merge 1 commit into
OpenCoworkAI:devfrom
craigamcw:fix/synthetic-model-text-only-input
Open

fix(model): default synthetic models to text-only input to avoid image 400s#252
craigamcw wants to merge 1 commit into
OpenCoworkAI:devfrom
craigamcw:fix/synthetic-model-text-only-input

Conversation

@craigamcw

Copy link
Copy Markdown

Summary

Fixes #251.

Text-only models that aren't in pi-ai's registry (e.g. deepseek-v4-pro via Ollama Cloud) hard-fail with a provider 400 whenever the conversation includes an image (screenshots from the GUI/computer-use tools, pasted images). The app surfaces this as an opaque "invalid message format".

Root cause: buildSyntheticPiModel hard-coded input: ['text', 'image'], so synthetic models falsely advertise vision support. The openai-completions provider only filters image content when !model.input.includes("image"), so images were sent to text-only endpoints. Ollama rejects them with HTTP 400 "this model does not support image input".

This PR defaults synthetic models to input: ['text']. We can't know whether an arbitrary unknown model supports vision, and a false vision claim hard-fails the entire request, whereas text-only just drops images gracefully. Vision-capable models resolved from the pi-ai registry keep their real modalities — only synthetic fallbacks change.

Type of change

  • Bug fix (fix)

Checklist

  • Code follows the project style (TypeScript strict, ESLint, Prettier) — changed lines only
  • Commit messages follow Conventional Commits
  • Self-review completed — no debug logs, no commented-out code
  • Tests added or updated for the changed behaviour
  • npm run test passes locally — see Testing (new test + typecheck pass; full suite not run locally)
  • npm run lint passes locally — see Testing (no new findings from this change)
  • UI changes tested on both macOS and Windows — N/A, no UI; provider-agnostic model-resolution logic
  • New user-facing strings added to i18n files — N/A

Testing

Repro and root cause are in #251. Branch is based on dev.

Verified locally:

  • Direct API check: posting image_url content to deepseek-v4-pro at https://ollama.com/v1 returns 400 "this model does not support image input"; text-only requests to the same model return 200. With this change, the synthetic model reports input: ['text'], so convertMessages drops image content and the request stays text-only.
  • New test tests/synthetic-model-input.test.ts (2 assertions) — passes via npx vitest run.
  • npx tsc --noEmit — the change introduces no type errors (one unrelated pre-existing error remains on dev: src/main/config/config-store.ts:369 'getConfigKey' is declared but never read).
  • eslint src/main/claude/pi-model-resolution.ts — no new findings on the changed lines (the file has one pre-existing @typescript-eslint/no-explicit-any at an unrelated location, left untouched to keep the diff focused).

Note: the full npm run test suite was not run locally (deps installed with --ignore-scripts, so native better-sqlite3 isn't built). CI will run the complete suite.

Trade-off

For a custom vision endpoint that happens to be resolved as a synthetic model (not in the pi-ai registry), images would now be filtered out rather than sent. That's a deliberate, conservative default: dropping images degrades gracefully, whereas the current false-positive vision claim hard-fails every request to text-only models. If desired, this could later be made configurable or driven by KNOWN_MODEL_SPECS.

…e 400s

Synthetic models (built for ids not in the pi-ai registry, e.g. deepseek-v4-pro
via Ollama) hard-coded `input: ['text', 'image']`, falsely claiming vision
support. Because the model advertised image input, the openai-completions
provider did not filter image content, so screenshots from the GUI/computer-use
tools were sent to text-only endpoints. Ollama rejects these with HTTP 400
"this model does not support image input", surfaced to users as an opaque
"invalid message format" error.

Default synthetic models to text-only input. Vision-capable models resolved from
the pi-ai registry keep their real modalities; only synthetic fallbacks change.
For a custom vision endpoint resolved as synthetic this drops images gracefully
instead of hard-failing the whole request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant