feat(ocr): add per-document prompt rendering with existing content support#925
feat(ocr): add per-document prompt rendering with existing content support#925adamflagg wants to merge 4 commits into
Conversation
Red phase: tests for SetPrompt/GetPrompt on LLMProvider and renderOCRPrompt function that renders OCR templates per-document with existing document content. Ref: icereed#882
Moves OCR template rendering from startup-only to per-document,
following the same pattern used by title, tag, correspondent, and
all other prompt templates. Enables the OCR prompt to reference the
document's existing text via {{.Content}}.
Changes:
- Add renderOCRPrompt() for per-document template rendering
- Add SetPrompt/GetPrompt on LLMProvider for per-document overrides
- Add ExistingContent to OCROptions, passed from background processor
- Update default ocr_prompt.tmpl with conditional {{.Content}} block
- Startup rendering passes Content="" as fallback (backward compatible)
Templates without {{.Content}} work identically to before. Users with
custom prompts opt in by adding {{if .Content}}...{{end}} to their
template.
Ref: icereed#882
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughPass existing document OCR text into per-document OCR prompt rendering and apply the rendered prompt to LLM-based OCR providers; add OCROptions.ExistingContent, per-document prompt renderer, LLMProvider.WithPrompt/GetPrompt, tests, and template updates to include {{.Content}}. Changes
Sequence Diagram(s)sequenceDiagram
participant Doc as Document
participant BG as BackgroundProcessor
participant Renderer as OCRPromptRenderer
participant Provider as OCRProvider (LLM)
Doc->>BG: ProcessAutoOcrTagDocuments(document)
BG->>BG: OCROptions.ExistingContent = document.Content
BG->>Provider: call ProcessDocumentOCR(options)
Provider->>Renderer: renderOCRPrompt(options.ExistingContent)
Renderer->>Renderer: execute template (Language, Content)
Renderer-->>Provider: renderedPrompt
Provider->>Provider: clone := WithPrompt(renderedPrompt)
Provider->>Provider: clone.ProcessImage / OCR flow
Provider-->>BG: OCR results
BG-->>Doc: persist OCR updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
ocr_prompt_test.go (1)
17-25: Extract theocrTemplateswap into a test helper to reduce duplication and improve test maintainability.All three tests duplicate the same pattern for swapping the package-global template under mutex lock. Use a helper function with
t.Cleanup()to centralize this setup. This also prevents future flakiness if other OCR tests are added witht.Parallel().♻️ Example refactor
+func swapOCRTemplate(t *testing.T, tmpl *template.Template) { + t.Helper() + templateMutex.Lock() + prev := ocrTemplate + ocrTemplate = tmpl + templateMutex.Unlock() + + t.Cleanup(func() { + templateMutex.Lock() + ocrTemplate = prev + templateMutex.Unlock() + }) +} + func TestRenderOCRPrompt_WithContent(t *testing.T) { tmpl, err := template.New("test").Funcs(sprig.FuncMap()).Parse( `OCR this.{{- if .Content}} Existing text: {{.Content}}{{- end}}`) require.NoError(t, err) - - templateMutex.Lock() - origTemplate := ocrTemplate - ocrTemplate = tmpl - templateMutex.Unlock() - defer func() { - templateMutex.Lock() - ocrTemplate = origTemplate - templateMutex.Unlock() - }() + swapOCRTemplate(t, tmpl)Also applies to: lines 38–46, 59–67
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ocr_prompt_test.go` around lines 17 - 25, Extract the repeated code that swaps the package-global ocrTemplate under templateMutex into a test helper (e.g., withTemplate or swapOCRTemplate) that accepts the new template and *testing.T; the helper should lock templateMutex, set ocrTemplate to the provided tmpl, and call t.Cleanup() to restore origTemplate inside a deferred lock so the original is restored after the test; replace the duplicated blocks in the tests (the sections using templateMutex, origTemplate, ocrTemplate, and defer) with calls to this helper to remove duplication and make tests safe for t.Parallel().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ocr_prompt_test.go`:
- Around line 54-73: The test TestRenderOCRPrompt_LanguageIncluded currently
only checks for the label and can miss an empty .Language; make the environment
deterministic and assert the full rendered value by setting LLM_LANGUAGE via
t.Setenv (or clearing it to exercise the default from getLikelyLanguage) before
calling renderOCRPrompt, then assert that result contains the expected string
like "Language: English" (or the explicit value you set); you can continue to
swap ocrTemplate under templateMutex as-is but change the assertion to verify
the rendered language content rather than just "Language:".
In `@ocr_prompt.go`:
- Around line 11-23: The renderOCRPrompt function injects the entire
existingContent into the template; before executing ocrTemplate you must
budget/truncate that text using the same helpers used elsewhere: call
getAvailableTokensForContent to determine the allowed token count and then call
truncateContentByTokens(existingContent, availableTokens) (or the project's
truncate helper signature) to produce truncatedContent, and pass that
truncatedContent into the template map ("Content") instead of existingContent;
keep the templateMutex.RLock/RUnlock and error handling as-is.
In `@ocr.go`:
- Around line 57-65: The code is mutating the shared singleton app.ocrProvider
by calling SetPrompt on ocr.LLMProvider, causing document-scoped prompts to leak
between requests; instead avoid mutating shared state by using a call-scoped
prompt: either (A) extend or call a provider API that accepts the prompt as a
parameter for this single OCR call (so renderOCRPrompt(options.ExistingContent)
is passed into the OCR invocation rather than calling SetPrompt), or (B) clone
the provider per document (create a per-document LLMProvider copy/WithPrompt
clone, use that for the OCR call, then discard) so app.ocrProvider is never
modified; update the code paths around renderOCRPrompt, app.ocrProvider,
ocr.LLMProvider and SetPrompt to implement one of these non-mutating approaches.
In `@ocr/llm_provider.go`:
- Around line 34-44: SetPrompt can store full document text in p.prompt and
ProcessImage currently logs that full prompt; change ProcessImage's debug
logging to avoid dumping p.prompt content by logging only redacted metadata
(e.g., indicate presence of a prompt, its length, and a hash/ID) or a fixed
redaction string instead of the raw p.prompt. Update the debug log call in
ProcessImage to reference p.prompt length or a safe placeholder and ensure
GetPrompt remains unchanged; preserve functions SetPrompt/GetPrompt but never
emit p.prompt contents to logs.
---
Nitpick comments:
In `@ocr_prompt_test.go`:
- Around line 17-25: Extract the repeated code that swaps the package-global
ocrTemplate under templateMutex into a test helper (e.g., withTemplate or
swapOCRTemplate) that accepts the new template and *testing.T; the helper should
lock templateMutex, set ocrTemplate to the provided tmpl, and call t.Cleanup()
to restore origTemplate inside a deferred lock so the original is restored after
the test; replace the duplicated blocks in the tests (the sections using
templateMutex, origTemplate, ocrTemplate, and defer) with calls to this helper
to remove duplication and make tests safe for t.Parallel().
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ef5d76f2-7d00-4de3-a209-4ae4899fd38b
📒 Files selected for processing (9)
background.godefault_prompts/ocr_prompt.tmplmain.goocr.goocr/llm_provider.goocr/llm_provider_prompt_test.goocr_prompt.goocr_prompt_test.gotypes.go
| func renderOCRPrompt(existingContent string) (string, error) { | ||
| templateMutex.RLock() | ||
| defer templateMutex.RUnlock() | ||
|
|
||
| var buf bytes.Buffer | ||
| err := ocrTemplate.Execute(&buf, map[string]interface{}{ | ||
| "Language": getLikelyLanguage(), | ||
| "Content": existingContent, | ||
| }) | ||
| if err != nil { | ||
| return "", fmt.Errorf("error executing OCR template: %w", err) | ||
| } | ||
| return buf.String(), nil |
There was a problem hiding this comment.
Budget existingContent before rendering the OCR prompt.
Unlike the other per-document prompt builders, this injects the full prior OCR text without any token budgeting or truncation. On long documents, every page OCR call will carry the entire document text, which can blow model context limits and add a lot of avoidable latency/cost. Please reuse the existing getAvailableTokensForContent / truncateContentByTokens path here before executing the template.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ocr_prompt.go` around lines 11 - 23, The renderOCRPrompt function injects the
entire existingContent into the template; before executing ocrTemplate you must
budget/truncate that text using the same helpers used elsewhere: call
getAvailableTokensForContent to determine the allowed token count and then call
truncateContentByTokens(existingContent, availableTokens) (or the project's
truncate helper signature) to produce truncatedContent, and pass that
truncatedContent into the template map ("Content") instead of existingContent;
keep the templateMutex.RLock/RUnlock and error handling as-is.
- Replace SetPrompt with WithPrompt to avoid mutating the shared ocrProvider singleton (concurrency safety) - Cap existing content to 8000 chars before injecting into OCR prompt to avoid blowing vision model context on long documents - Tighten language test assertion with t.Setenv and assert.Equal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ocr/llm_provider.go (1)
34-45:⚠️ Potential issue | 🟠 MajorRedact the per-document prompt from debug logs.
With
WithPromptnow carrying{{.Content}}intop.prompt, Line 126 still emits the full prompt body. In debug mode that will write document text/PII to logs. Please log prompt presence/length only, or a fixed redaction string instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ocr/llm_provider.go` around lines 34 - 45, The code currently logs the full per-document prompt (p.prompt) which may contain PII; update the logging site that references LLMProvider.prompt (triggered after WithPrompt is used) to avoid emitting the prompt body — replace any direct inclusion of p.prompt with either a fixed redaction string like "<redacted-prompt>" or non-sensitive metadata such as fmt.Sprintf("prompt_len=%d", len(p.prompt)); ensure this change touches the logging call that currently prints p.prompt (referencing LLMProvider, WithPrompt, GetPrompt) so debug mode logs only presence/length and never the prompt contents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ocr.go`:
- Around line 60-65: The current guard skips applying a rendered empty OCR
prompt, preventing intentional empty templates from taking effect; change the
logic so that when renderOCRPrompt returns without error you always apply the
result (including an empty string) to the provider. Specifically, in the block
around renderOCRPrompt and ocrPrompt, keep the err check and warning path
unchanged, but replace the "else if ocrPrompt != \"\"" branch with an "else"
branch that type-asserts provider to *ocr.LLMProvider and calls
WithPrompt(ocrPrompt) regardless of its emptiness; only the error path should
fall back to the provider default.
---
Duplicate comments:
In `@ocr/llm_provider.go`:
- Around line 34-45: The code currently logs the full per-document prompt
(p.prompt) which may contain PII; update the logging site that references
LLMProvider.prompt (triggered after WithPrompt is used) to avoid emitting the
prompt body — replace any direct inclusion of p.prompt with either a fixed
redaction string like "<redacted-prompt>" or non-sensitive metadata such as
fmt.Sprintf("prompt_len=%d", len(p.prompt)); ensure this change touches the
logging call that currently prints p.prompt (referencing LLMProvider,
WithPrompt, GetPrompt) so debug mode logs only presence/length and never the
prompt contents.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9b105d4b-6182-4032-947f-7b2e44d5be70
📒 Files selected for processing (5)
ocr.goocr/llm_provider.goocr/llm_provider_prompt_test.goocr_prompt.goocr_prompt_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- ocr_prompt.go
- ocr_prompt_test.go
…m debug log - Remove empty-string guard so intentionally-empty templates take effect - Log prompt length instead of full content in debug output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Allows the OCR prompt template to reference a document's existing text (e.g., Tesseract output) via
{{.Content}}. This helps vision models cross-reference basic OCR when transcribing difficult words or numbers.Changes
SetPrompt/GetPromptonLLMProvider: Enables per-document prompt overrides without changing theProviderinterfaceExistingContentinOCROptions: Callers pass existing document text, which flows into the template as{{.Content}}How it works
The OCR prompt template already supported
{{.Language}}. This adds{{.Content}}following the exact same per-document rendering pattern used by every other prompt template (title_prompt.tmpl,tag_prompt.tmpl, etc.).The default template adds a conditional block:
Backward Compatibility
{{.Content}}work identically to before{{.Content}}to their templateCloses #882
Test Plan
SetPrompt/GetPrompton LLMProviderrenderOCRPromptwith and without contentgo vet ./...cleanSummary by CodeRabbit
New Features
Tests