Problem
Models fail to load across multiple surfaces (chat.webllm.ai, JSFiddle examples, Chrome extensions) as reported in #85. Current error handling lacks:
- Structured error classification
- Automatic retry mechanisms
- Cache recovery logic
- User-actionable error messages
- Self-hosting capabilities
Root Causes Identified
- Insufficient Error Diagnostics - Generic error messages without classification codes
- No Retry Logic - Transient network/CDN failures cause hard stops
- Cache Corruption - No automatic cache clearing and retry
- No Self-Hosting Support - Users locked into default CDN with no override option
Proposed Solution
Phase 1: Enhanced Error Diagnostics (High Priority)
- Add
ModelLoadErrorCode enum (manifest_fetch_failed, artifact_fetch_failed, worker_init_failed, webgpu_init_failed, cache_invalid)
- Implement error classification in
webllm.ts
- Add structured error display with actionable guidance
- Include "Copy Diagnostics" feature for bug reports
Files: app/client/api.ts, app/client/webllm.ts, app/store/chat.ts
Phase 2: Retry Logic & Self-Recovery
- Automatic retry with exponential backoff (max 3 attempts, 1s → 2s → 4s)
- Automatic cache clearing on cache_invalid errors
- Progress indication during retries
- Only retry on retryable error types
Files: app/client/webllm.ts
Phase 3: Custom Artifact Source Support
Files: app/store/config.ts, app/components/model-config.tsx, app/client/webllm.ts
Phase 4: Documentation
- Troubleshooting guide with error code explanations
- Self-hosting setup instructions
- Updated issue templates with diagnostic fields
Files: docs/TROUBLESHOOTING.md, docs/SELF_HOSTING.md, .github/ISSUE_TEMPLATE/bug_report.md
Acceptance Criteria
Implementation Details
Full implementation plan available in plan-85.md with:
- Detailed code examples for each phase
- Testing strategy (unit, integration, manual)
- Rollout strategy with risk assessment
- Success metrics and monitoring approach
Related Issues
Estimated Effort
Time: 3-4 weeks (1 developer)
Priority: High (affects user experience across all surfaces)
Risk: Low-Medium (Phase 1-2), Low (Phase 3-4)
Problem
Models fail to load across multiple surfaces (chat.webllm.ai, JSFiddle examples, Chrome extensions) as reported in #85. Current error handling lacks:
Root Causes Identified
Proposed Solution
Phase 1: Enhanced Error Diagnostics (High Priority)
ModelLoadErrorCodeenum (manifest_fetch_failed, artifact_fetch_failed, worker_init_failed, webgpu_init_failed, cache_invalid)webllm.tsFiles:
app/client/api.ts,app/client/webllm.ts,app/store/chat.tsPhase 2: Retry Logic & Self-Recovery
Files:
app/client/webllm.tsPhase 3: Custom Artifact Source Support
customModelBaseUrlconfig optionFiles:
app/store/config.ts,app/components/model-config.tsx,app/client/webllm.tsPhase 4: Documentation
Files:
docs/TROUBLESHOOTING.md,docs/SELF_HOSTING.md,.github/ISSUE_TEMPLATE/bug_report.mdAcceptance Criteria
Implementation Details
Full implementation plan available in
plan-85.mdwith:Related Issues
Estimated Effort
Time: 3-4 weeks (1 developer)
Priority: High (affects user experience across all surfaces)
Risk: Low-Medium (Phase 1-2), Low (Phase 3-4)