Skip to content

feat(ocr): add per-document prompt rendering with existing content support#925

Open
adamflagg wants to merge 4 commits into
icereed:mainfrom
adamflagg:feat/ocr-prompt-content
Open

feat(ocr): add per-document prompt rendering with existing content support#925
adamflagg wants to merge 4 commits into
icereed:mainfrom
adamflagg:feat/ocr-prompt-content

Conversation

@adamflagg

@adamflagg adamflagg commented Mar 7, 2026

Copy link
Copy Markdown

Summary

Allows the OCR prompt template to reference a document's existing text (e.g., Tesseract output) via {{.Content}}. This helps vision models cross-reference basic OCR when transcribing difficult words or numbers.

Changes

  • Per-document prompt rendering: OCR template is now rendered per-document (matching the pattern used by title, tag, correspondent, and all other prompts) instead of once at startup
  • SetPrompt/GetPrompt on LLMProvider: Enables per-document prompt overrides without changing the Provider interface
  • ExistingContent in OCROptions: Callers pass existing document text, which flows into the template as {{.Content}}
  • Updated default template: Conditionally includes existing text as reference material for the vision model

How it works

The OCR prompt template already supported {{.Language}}. This adds {{.Content}} following the exact same per-document rendering pattern used by every other prompt template (title_prompt.tmpl, tag_prompt.tmpl, etc.).

The default template adds a conditional block:

{{- if .Content}}
The document already has the following text extracted via basic OCR.
Use this as a reference to help identify difficult words or numbers,
but rely primarily on what you see in the image.
{{.Content}}
{{- end}}

Backward Compatibility

  • Templates without {{.Content}} work identically to before
  • A static fallback prompt (rendered without Content at startup) ensures the LLM provider has a default
  • Non-LLM providers (Azure, Google DocAI, Mistral OCR, Docling) are unaffected
  • Users with custom prompts opt in by adding {{.Content}} to their template

Closes #882

Test Plan

  • Unit tests for SetPrompt/GetPrompt on LLMProvider
  • Unit tests for renderOCRPrompt with and without content
  • Existing test suite passes unchanged
  • go vet ./... clean
  • Manual test: OCR a document with existing Tesseract text, verify prompt includes it

Summary by CodeRabbit

  • New Features

    • OCR now surfaces previously extracted text so users can review and correct it before re-processing.
    • Prompts are rendered per-document and include detected language for clearer, context-aware OCR guidance.
  • Tests

    • Added tests to validate per-document prompt rendering and prompt-handling behavior.

Red phase: tests for SetPrompt/GetPrompt on LLMProvider and
renderOCRPrompt function that renders OCR templates per-document
with existing document content.

Ref: icereed#882
Moves OCR template rendering from startup-only to per-document,
following the same pattern used by title, tag, correspondent, and
all other prompt templates. Enables the OCR prompt to reference the
document's existing text via {{.Content}}.

Changes:
- Add renderOCRPrompt() for per-document template rendering
- Add SetPrompt/GetPrompt on LLMProvider for per-document overrides
- Add ExistingContent to OCROptions, passed from background processor
- Update default ocr_prompt.tmpl with conditional {{.Content}} block
- Startup rendering passes Content="" as fallback (backward compatible)

Templates without {{.Content}} work identically to before. Users with
custom prompts opt in by adding {{if .Content}}...{{end}} to their
template.

Ref: icereed#882
@coderabbitai

coderabbitai Bot commented Mar 7, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9a60a63-5f87-4cdd-8541-aab46cc7d071

📥 Commits

Reviewing files that changed from the base of the PR and between e157708 and 50d2eb4.

📒 Files selected for processing (2)
  • ocr.go
  • ocr/llm_provider.go

📝 Walkthrough

Walkthrough

Pass existing document OCR text into per-document OCR prompt rendering and apply the rendered prompt to LLM-based OCR providers; add OCROptions.ExistingContent, per-document prompt renderer, LLMProvider.WithPrompt/GetPrompt, tests, and template updates to include {{.Content}}.

Changes

Cohort / File(s) Summary
Types & Integration
types.go, background.go, main.go
Add ExistingContent to OCROptions; populate it from document.Content in processAutoOcrTagDocuments; adjust OCR template execution fallback and error message text.
OCR Prompt Template & Rendering
default_prompts/ocr_prompt.tmpl, ocr_prompt.go, ocr_prompt_test.go
Template gains conditional {{.Content}} block and guidance for existing OCR; add renderOCRPrompt(existingContent string) with content length cap and mutex-protected template rendering; unit tests for content inclusion and language handling.
OCR Provider API & Tests
ocr/llm_provider.go, ocr/llm_provider_prompt_test.go
Add WithPrompt(prompt string) *LLMProvider and GetPrompt() string to support per-call prompt cloning; reduce prompt logging to length; tests verify cloning and prompt preservation.
OCR Pipeline Integration
ocr.go, background.go (usage)
Render per-document OCR prompt from OCROptions.ExistingContent, clone/wrap LLM provider with the rendered prompt before OCR calls, and use local provider variable for capability checks and ProcessImage calls.
Minor
background.go (single-line)
Set OCROptions.ExistingContent = document.Content when initiating OCR options.

Sequence Diagram(s)

sequenceDiagram
    participant Doc as Document
    participant BG as BackgroundProcessor
    participant Renderer as OCRPromptRenderer
    participant Provider as OCRProvider (LLM)

    Doc->>BG: ProcessAutoOcrTagDocuments(document)
    BG->>BG: OCROptions.ExistingContent = document.Content
    BG->>Provider: call ProcessDocumentOCR(options)
    Provider->>Renderer: renderOCRPrompt(options.ExistingContent)
    Renderer->>Renderer: execute template (Language, Content)
    Renderer-->>Provider: renderedPrompt
    Provider->>Provider: clone := WithPrompt(renderedPrompt)
    Provider->>Provider: clone.ProcessImage / OCR flow
    Provider-->>BG: OCR results
    BG-->>Doc: persist OCR updates
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐇 I nibble at the fractured text,

I fold the crumbs to make them whole.
Feed me the old lines, I’ll mend the rest,
With tiny hops and bright-eyed soul.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: enabling per-document OCR prompt rendering with support for existing document content, which directly addresses the PR's core objective.
Linked Issues check ✅ Passed All coding requirements from issue #882 are met: {{.Content}} variable is implemented in the OCR prompt, existing OCR text flows through ExistingContent in OCROptions, and LLM can cross-reference prior OCR with page image.
Out of Scope Changes check ✅ Passed All changes directly support per-document OCR prompt rendering with existing content: OCROptions extension, template rendering logic, LLMProvider methods, tests, and prompt template updates are all within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
ocr_prompt_test.go (1)

17-25: Extract the ocrTemplate swap into a test helper to reduce duplication and improve test maintainability.

All three tests duplicate the same pattern for swapping the package-global template under mutex lock. Use a helper function with t.Cleanup() to centralize this setup. This also prevents future flakiness if other OCR tests are added with t.Parallel().

♻️ Example refactor
+func swapOCRTemplate(t *testing.T, tmpl *template.Template) {
+	t.Helper()
+	templateMutex.Lock()
+	prev := ocrTemplate
+	ocrTemplate = tmpl
+	templateMutex.Unlock()
+
+	t.Cleanup(func() {
+		templateMutex.Lock()
+		ocrTemplate = prev
+		templateMutex.Unlock()
+	})
+}
+
 func TestRenderOCRPrompt_WithContent(t *testing.T) {
 	tmpl, err := template.New("test").Funcs(sprig.FuncMap()).Parse(
 		`OCR this.{{- if .Content}} Existing text: {{.Content}}{{- end}}`)
 	require.NoError(t, err)
-
-	templateMutex.Lock()
-	origTemplate := ocrTemplate
-	ocrTemplate = tmpl
-	templateMutex.Unlock()
-	defer func() {
-		templateMutex.Lock()
-		ocrTemplate = origTemplate
-		templateMutex.Unlock()
-	}()
+	swapOCRTemplate(t, tmpl)

Also applies to: lines 38–46, 59–67

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ocr_prompt_test.go` around lines 17 - 25, Extract the repeated code that
swaps the package-global ocrTemplate under templateMutex into a test helper
(e.g., withTemplate or swapOCRTemplate) that accepts the new template and
*testing.T; the helper should lock templateMutex, set ocrTemplate to the
provided tmpl, and call t.Cleanup() to restore origTemplate inside a deferred
lock so the original is restored after the test; replace the duplicated blocks
in the tests (the sections using templateMutex, origTemplate, ocrTemplate, and
defer) with calls to this helper to remove duplication and make tests safe for
t.Parallel().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ocr_prompt_test.go`:
- Around line 54-73: The test TestRenderOCRPrompt_LanguageIncluded currently
only checks for the label and can miss an empty .Language; make the environment
deterministic and assert the full rendered value by setting LLM_LANGUAGE via
t.Setenv (or clearing it to exercise the default from getLikelyLanguage) before
calling renderOCRPrompt, then assert that result contains the expected string
like "Language: English" (or the explicit value you set); you can continue to
swap ocrTemplate under templateMutex as-is but change the assertion to verify
the rendered language content rather than just "Language:".

In `@ocr_prompt.go`:
- Around line 11-23: The renderOCRPrompt function injects the entire
existingContent into the template; before executing ocrTemplate you must
budget/truncate that text using the same helpers used elsewhere: call
getAvailableTokensForContent to determine the allowed token count and then call
truncateContentByTokens(existingContent, availableTokens) (or the project's
truncate helper signature) to produce truncatedContent, and pass that
truncatedContent into the template map ("Content") instead of existingContent;
keep the templateMutex.RLock/RUnlock and error handling as-is.

In `@ocr.go`:
- Around line 57-65: The code is mutating the shared singleton app.ocrProvider
by calling SetPrompt on ocr.LLMProvider, causing document-scoped prompts to leak
between requests; instead avoid mutating shared state by using a call-scoped
prompt: either (A) extend or call a provider API that accepts the prompt as a
parameter for this single OCR call (so renderOCRPrompt(options.ExistingContent)
is passed into the OCR invocation rather than calling SetPrompt), or (B) clone
the provider per document (create a per-document LLMProvider copy/WithPrompt
clone, use that for the OCR call, then discard) so app.ocrProvider is never
modified; update the code paths around renderOCRPrompt, app.ocrProvider,
ocr.LLMProvider and SetPrompt to implement one of these non-mutating approaches.

In `@ocr/llm_provider.go`:
- Around line 34-44: SetPrompt can store full document text in p.prompt and
ProcessImage currently logs that full prompt; change ProcessImage's debug
logging to avoid dumping p.prompt content by logging only redacted metadata
(e.g., indicate presence of a prompt, its length, and a hash/ID) or a fixed
redaction string instead of the raw p.prompt. Update the debug log call in
ProcessImage to reference p.prompt length or a safe placeholder and ensure
GetPrompt remains unchanged; preserve functions SetPrompt/GetPrompt but never
emit p.prompt contents to logs.

---

Nitpick comments:
In `@ocr_prompt_test.go`:
- Around line 17-25: Extract the repeated code that swaps the package-global
ocrTemplate under templateMutex into a test helper (e.g., withTemplate or
swapOCRTemplate) that accepts the new template and *testing.T; the helper should
lock templateMutex, set ocrTemplate to the provided tmpl, and call t.Cleanup()
to restore origTemplate inside a deferred lock so the original is restored after
the test; replace the duplicated blocks in the tests (the sections using
templateMutex, origTemplate, ocrTemplate, and defer) with calls to this helper
to remove duplication and make tests safe for t.Parallel().

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef5d76f2-7d00-4de3-a209-4ae4899fd38b

📥 Commits

Reviewing files that changed from the base of the PR and between d445bfa and c574a57.

📒 Files selected for processing (9)
  • background.go
  • default_prompts/ocr_prompt.tmpl
  • main.go
  • ocr.go
  • ocr/llm_provider.go
  • ocr/llm_provider_prompt_test.go
  • ocr_prompt.go
  • ocr_prompt_test.go
  • types.go

Comment thread ocr_prompt_test.go
Comment thread ocr_prompt.go
Comment on lines +11 to +23
func renderOCRPrompt(existingContent string) (string, error) {
templateMutex.RLock()
defer templateMutex.RUnlock()

var buf bytes.Buffer
err := ocrTemplate.Execute(&buf, map[string]interface{}{
"Language": getLikelyLanguage(),
"Content": existingContent,
})
if err != nil {
return "", fmt.Errorf("error executing OCR template: %w", err)
}
return buf.String(), nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Budget existingContent before rendering the OCR prompt.

Unlike the other per-document prompt builders, this injects the full prior OCR text without any token budgeting or truncation. On long documents, every page OCR call will carry the entire document text, which can blow model context limits and add a lot of avoidable latency/cost. Please reuse the existing getAvailableTokensForContent / truncateContentByTokens path here before executing the template.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ocr_prompt.go` around lines 11 - 23, The renderOCRPrompt function injects the
entire existingContent into the template; before executing ocrTemplate you must
budget/truncate that text using the same helpers used elsewhere: call
getAvailableTokensForContent to determine the allowed token count and then call
truncateContentByTokens(existingContent, availableTokens) (or the project's
truncate helper signature) to produce truncatedContent, and pass that
truncatedContent into the template map ("Content") instead of existingContent;
keep the templateMutex.RLock/RUnlock and error handling as-is.

Comment thread ocr.go Outdated
Comment thread ocr/llm_provider.go Outdated
- Replace SetPrompt with WithPrompt to avoid mutating the shared
  ocrProvider singleton (concurrency safety)
- Cap existing content to 8000 chars before injecting into OCR prompt
  to avoid blowing vision model context on long documents
- Tighten language test assertion with t.Setenv and assert.Equal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ocr/llm_provider.go (1)

34-45: ⚠️ Potential issue | 🟠 Major

Redact the per-document prompt from debug logs.

With WithPrompt now carrying {{.Content}} into p.prompt, Line 126 still emits the full prompt body. In debug mode that will write document text/PII to logs. Please log prompt presence/length only, or a fixed redaction string instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ocr/llm_provider.go` around lines 34 - 45, The code currently logs the full
per-document prompt (p.prompt) which may contain PII; update the logging site
that references LLMProvider.prompt (triggered after WithPrompt is used) to avoid
emitting the prompt body — replace any direct inclusion of p.prompt with either
a fixed redaction string like "<redacted-prompt>" or non-sensitive metadata such
as fmt.Sprintf("prompt_len=%d", len(p.prompt)); ensure this change touches the
logging call that currently prints p.prompt (referencing LLMProvider,
WithPrompt, GetPrompt) so debug mode logs only presence/length and never the
prompt contents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ocr.go`:
- Around line 60-65: The current guard skips applying a rendered empty OCR
prompt, preventing intentional empty templates from taking effect; change the
logic so that when renderOCRPrompt returns without error you always apply the
result (including an empty string) to the provider. Specifically, in the block
around renderOCRPrompt and ocrPrompt, keep the err check and warning path
unchanged, but replace the "else if ocrPrompt != \"\"" branch with an "else"
branch that type-asserts provider to *ocr.LLMProvider and calls
WithPrompt(ocrPrompt) regardless of its emptiness; only the error path should
fall back to the provider default.

---

Duplicate comments:
In `@ocr/llm_provider.go`:
- Around line 34-45: The code currently logs the full per-document prompt
(p.prompt) which may contain PII; update the logging site that references
LLMProvider.prompt (triggered after WithPrompt is used) to avoid emitting the
prompt body — replace any direct inclusion of p.prompt with either a fixed
redaction string like "<redacted-prompt>" or non-sensitive metadata such as
fmt.Sprintf("prompt_len=%d", len(p.prompt)); ensure this change touches the
logging call that currently prints p.prompt (referencing LLMProvider,
WithPrompt, GetPrompt) so debug mode logs only presence/length and never the
prompt contents.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b105d4b-6182-4032-947f-7b2e44d5be70

📥 Commits

Reviewing files that changed from the base of the PR and between c574a57 and e157708.

📒 Files selected for processing (5)
  • ocr.go
  • ocr/llm_provider.go
  • ocr/llm_provider_prompt_test.go
  • ocr_prompt.go
  • ocr_prompt_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • ocr_prompt.go
  • ocr_prompt_test.go

Comment thread ocr.go Outdated
…m debug log

- Remove empty-string guard so intentionally-empty templates take effect
- Log prompt length instead of full content in debug output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ocr-prompt should have a {{.Content}} variable with existing OCR

1 participant