feat(ocr): add per-document prompt rendering with existing content support by adamflagg · Pull Request #925 · icereed/paperless-gpt

adamflagg · 2026-03-07T05:10:09Z

Summary

Allows the OCR prompt template to reference a document's existing text (e.g., Tesseract output) via {{.Content}}. This helps vision models cross-reference basic OCR when transcribing difficult words or numbers.

Changes

Per-document prompt rendering: OCR template is now rendered per-document (matching the pattern used by title, tag, correspondent, and all other prompts) instead of once at startup
SetPrompt/GetPrompt on LLMProvider: Enables per-document prompt overrides without changing the Provider interface
ExistingContent in OCROptions: Callers pass existing document text, which flows into the template as {{.Content}}
Updated default template: Conditionally includes existing text as reference material for the vision model

How it works

The OCR prompt template already supported {{.Language}}. This adds {{.Content}} following the exact same per-document rendering pattern used by every other prompt template (title_prompt.tmpl, tag_prompt.tmpl, etc.).

The default template adds a conditional block:

{{- if .Content}}
The document already has the following text extracted via basic OCR.
Use this as a reference to help identify difficult words or numbers,
but rely primarily on what you see in the image.
{{.Content}}
{{- end}}

Backward Compatibility

Templates without {{.Content}} work identically to before
A static fallback prompt (rendered without Content at startup) ensures the LLM provider has a default
Non-LLM providers (Azure, Google DocAI, Mistral OCR, Docling) are unaffected
Users with custom prompts opt in by adding {{.Content}} to their template

Closes #882

Test Plan

Unit tests for SetPrompt/GetPrompt on LLMProvider
Unit tests for renderOCRPrompt with and without content
Existing test suite passes unchanged
go vet ./... clean
Manual test: OCR a document with existing Tesseract text, verify prompt includes it

Summary by CodeRabbit

New Features
- OCR now surfaces previously extracted text so users can review and correct it before re-processing.
- Prompts are rendered per-document and include detected language for clearer, context-aware OCR guidance.
Tests
- Added tests to validate per-document prompt rendering and prompt-handling behavior.

Red phase: tests for SetPrompt/GetPrompt on LLMProvider and renderOCRPrompt function that renders OCR templates per-document with existing document content. Ref: icereed#882

Moves OCR template rendering from startup-only to per-document, following the same pattern used by title, tag, correspondent, and all other prompt templates. Enables the OCR prompt to reference the document's existing text via {{.Content}}. Changes: - Add renderOCRPrompt() for per-document template rendering - Add SetPrompt/GetPrompt on LLMProvider for per-document overrides - Add ExistingContent to OCROptions, passed from background processor - Update default ocr_prompt.tmpl with conditional {{.Content}} block - Startup rendering passes Content="" as fallback (backward compatible) Templates without {{.Content}} work identically to before. Users with custom prompts opt in by adding {{if .Content}}...{{end}} to their template. Ref: icereed#882

coderabbitai · 2026-03-07T05:10:22Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9a60a63-5f87-4cdd-8541-aab46cc7d071

📥 Commits

Reviewing files that changed from the base of the PR and between e157708 and 50d2eb4.

📒 Files selected for processing (2)

ocr.go
ocr/llm_provider.go

📝 Walkthrough

Walkthrough

Pass existing document OCR text into per-document OCR prompt rendering and apply the rendered prompt to LLM-based OCR providers; add OCROptions.ExistingContent, per-document prompt renderer, LLMProvider.WithPrompt/GetPrompt, tests, and template updates to include {{.Content}}.

Changes

Cohort / File(s)	Summary
Types & Integration `types.go`, `background.go`, `main.go`	Add `ExistingContent` to `OCROptions`; populate it from `document.Content` in `processAutoOcrTagDocuments`; adjust OCR template execution fallback and error message text.
OCR Prompt Template & Rendering `default_prompts/ocr_prompt.tmpl`, `ocr_prompt.go`, `ocr_prompt_test.go`	Template gains conditional `{{.Content}}` block and guidance for existing OCR; add `renderOCRPrompt(existingContent string)` with content length cap and mutex-protected template rendering; unit tests for content inclusion and language handling.
OCR Provider API & Tests `ocr/llm_provider.go`, `ocr/llm_provider_prompt_test.go`	Add `WithPrompt(prompt string) *LLMProvider` and `GetPrompt() string` to support per-call prompt cloning; reduce prompt logging to length; tests verify cloning and prompt preservation.
OCR Pipeline Integration `ocr.go`, `background.go` (usage)	Render per-document OCR prompt from `OCROptions.ExistingContent`, clone/wrap LLM provider with the rendered prompt before OCR calls, and use local provider variable for capability checks and ProcessImage calls.
Minor `background.go` (single-line)	Set `OCROptions.ExistingContent = document.Content` when initiating OCR options.

Sequence Diagram(s)

sequenceDiagram
    participant Doc as Document
    participant BG as BackgroundProcessor
    participant Renderer as OCRPromptRenderer
    participant Provider as OCRProvider (LLM)

    Doc->>BG: ProcessAutoOcrTagDocuments(document)
    BG->>BG: OCROptions.ExistingContent = document.Content
    BG->>Provider: call ProcessDocumentOCR(options)
    Provider->>Renderer: renderOCRPrompt(options.ExistingContent)
    Renderer->>Renderer: execute template (Language, Content)
    Renderer-->>Provider: renderedPrompt
    Provider->>Provider: clone := WithPrompt(renderedPrompt)
    Provider->>Provider: clone.ProcessImage / OCR flow
    Provider-->>BG: OCR results
    BG-->>Doc: persist OCR updates

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Refactor - Move background logic to background.go (and background_test.go) #321: Direct overlap — modifies processAutoOcrTagDocuments and OCROptions handling.
fix(ocr): Fix custom OCR prompt #323: Related OCR prompt plumbing and provider-level prompt configuration changes.
feat: add support automatic OCR #75: Modifies automatic OCR flow and OCROptions passed into OCR processing; related to background OCR adjustments.

Poem

🐇 I nibble at the fractured text,

I fold the crumbs to make them whole.
Feed me the old lines, I’ll mend the rest,
With tiny hops and bright-eyed soul.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: enabling per-document OCR prompt rendering with support for existing document content, which directly addresses the PR's core objective.
Linked Issues check	✅ Passed	All coding requirements from issue `#882` are met: {{.Content}} variable is implemented in the OCR prompt, existing OCR text flows through ExistingContent in OCROptions, and LLM can cross-reference prior OCR with page image.
Out of Scope Changes check	✅ Passed	All changes directly support per-document OCR prompt rendering with existing content: OCROptions extension, template rendering logic, LLMProvider methods, tests, and prompt template updates are all within scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

ocr_prompt_test.go (1)

17-25: Extract the ocrTemplate swap into a test helper to reduce duplication and improve test maintainability.

All three tests duplicate the same pattern for swapping the package-global template under mutex lock. Use a helper function with t.Cleanup() to centralize this setup. This also prevents future flakiness if other OCR tests are added with t.Parallel().

♻️ Example refactor

+func swapOCRTemplate(t *testing.T, tmpl *template.Template) {
+	t.Helper()
+	templateMutex.Lock()
+	prev := ocrTemplate
+	ocrTemplate = tmpl
+	templateMutex.Unlock()
+
+	t.Cleanup(func() {
+		templateMutex.Lock()
+		ocrTemplate = prev
+		templateMutex.Unlock()
+	})
+}
+
 func TestRenderOCRPrompt_WithContent(t *testing.T) {
 	tmpl, err := template.New("test").Funcs(sprig.FuncMap()).Parse(
 		`OCR this.{{- if .Content}} Existing text: {{.Content}}{{- end}}`)
 	require.NoError(t, err)
-
-	templateMutex.Lock()
-	origTemplate := ocrTemplate
-	ocrTemplate = tmpl
-	templateMutex.Unlock()
-	defer func() {
-		templateMutex.Lock()
-		ocrTemplate = origTemplate
-		templateMutex.Unlock()
-	}()
+	swapOCRTemplate(t, tmpl)

Also applies to: lines 38–46, 59–67

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ocr_prompt_test.go` around lines 17 - 25, Extract the repeated code that
swaps the package-global ocrTemplate under templateMutex into a test helper
(e.g., withTemplate or swapOCRTemplate) that accepts the new template and
*testing.T; the helper should lock templateMutex, set ocrTemplate to the
provided tmpl, and call t.Cleanup() to restore origTemplate inside a deferred
lock so the original is restored after the test; replace the duplicated blocks
in the tests (the sections using templateMutex, origTemplate, ocrTemplate, and
defer) with calls to this helper to remove duplication and make tests safe for
t.Parallel().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ocr_prompt_test.go`:
- Around line 54-73: The test TestRenderOCRPrompt_LanguageIncluded currently
only checks for the label and can miss an empty .Language; make the environment
deterministic and assert the full rendered value by setting LLM_LANGUAGE via
t.Setenv (or clearing it to exercise the default from getLikelyLanguage) before
calling renderOCRPrompt, then assert that result contains the expected string
like "Language: English" (or the explicit value you set); you can continue to
swap ocrTemplate under templateMutex as-is but change the assertion to verify
the rendered language content rather than just "Language:".

In `@ocr_prompt.go`:
- Around line 11-23: The renderOCRPrompt function injects the entire
existingContent into the template; before executing ocrTemplate you must
budget/truncate that text using the same helpers used elsewhere: call
getAvailableTokensForContent to determine the allowed token count and then call
truncateContentByTokens(existingContent, availableTokens) (or the project's
truncate helper signature) to produce truncatedContent, and pass that
truncatedContent into the template map ("Content") instead of existingContent;
keep the templateMutex.RLock/RUnlock and error handling as-is.

In `@ocr.go`:
- Around line 57-65: The code is mutating the shared singleton app.ocrProvider
by calling SetPrompt on ocr.LLMProvider, causing document-scoped prompts to leak
between requests; instead avoid mutating shared state by using a call-scoped
prompt: either (A) extend or call a provider API that accepts the prompt as a
parameter for this single OCR call (so renderOCRPrompt(options.ExistingContent)
is passed into the OCR invocation rather than calling SetPrompt), or (B) clone
the provider per document (create a per-document LLMProvider copy/WithPrompt
clone, use that for the OCR call, then discard) so app.ocrProvider is never
modified; update the code paths around renderOCRPrompt, app.ocrProvider,
ocr.LLMProvider and SetPrompt to implement one of these non-mutating approaches.

In `@ocr/llm_provider.go`:
- Around line 34-44: SetPrompt can store full document text in p.prompt and
ProcessImage currently logs that full prompt; change ProcessImage's debug
logging to avoid dumping p.prompt content by logging only redacted metadata
(e.g., indicate presence of a prompt, its length, and a hash/ID) or a fixed
redaction string instead of the raw p.prompt. Update the debug log call in
ProcessImage to reference p.prompt length or a safe placeholder and ensure
GetPrompt remains unchanged; preserve functions SetPrompt/GetPrompt but never
emit p.prompt contents to logs.

---

Nitpick comments:
In `@ocr_prompt_test.go`:
- Around line 17-25: Extract the repeated code that swaps the package-global
ocrTemplate under templateMutex into a test helper (e.g., withTemplate or
swapOCRTemplate) that accepts the new template and *testing.T; the helper should
lock templateMutex, set ocrTemplate to the provided tmpl, and call t.Cleanup()
to restore origTemplate inside a deferred lock so the original is restored after
the test; replace the duplicated blocks in the tests (the sections using
templateMutex, origTemplate, ocrTemplate, and defer) with calls to this helper
to remove duplication and make tests safe for t.Parallel().

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef5d76f2-7d00-4de3-a209-4ae4899fd38b

📥 Commits

Reviewing files that changed from the base of the PR and between d445bfa and c574a57.

📒 Files selected for processing (9)

background.go
default_prompts/ocr_prompt.tmpl
main.go
ocr.go
ocr/llm_provider.go
ocr/llm_provider_prompt_test.go
ocr_prompt.go
ocr_prompt_test.go
types.go

coderabbitai · 2026-03-07T05:16:01Z

+func renderOCRPrompt(existingContent string) (string, error) {
+	templateMutex.RLock()
+	defer templateMutex.RUnlock()
+
+	var buf bytes.Buffer
+	err := ocrTemplate.Execute(&buf, map[string]interface{}{
+		"Language": getLikelyLanguage(),
+		"Content":  existingContent,
+	})
+	if err != nil {
+		return "", fmt.Errorf("error executing OCR template: %w", err)
+	}
+	return buf.String(), nil


⚠️ Potential issue | 🟠 Major

Budget existingContent before rendering the OCR prompt.

Unlike the other per-document prompt builders, this injects the full prior OCR text without any token budgeting or truncation. On long documents, every page OCR call will carry the entire document text, which can blow model context limits and add a lot of avoidable latency/cost. Please reuse the existing getAvailableTokensForContent / truncateContentByTokens path here before executing the template.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ocr_prompt.go` around lines 11 - 23, The renderOCRPrompt function injects the entire existingContent into the template; before executing ocrTemplate you must budget/truncate that text using the same helpers used elsewhere: call getAvailableTokensForContent to determine the allowed token count and then call truncateContentByTokens(existingContent, availableTokens) (or the project's truncate helper signature) to produce truncatedContent, and pass that truncatedContent into the template map ("Content") instead of existingContent; keep the templateMutex.RLock/RUnlock and error handling as-is.

- Replace SetPrompt with WithPrompt to avoid mutating the shared ocrProvider singleton (concurrency safety) - Cap existing content to 8000 chars before injecting into OCR prompt to avoid blowing vision model context on long documents - Tighten language test assertion with t.Setenv and assert.Equal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

ocr/llm_provider.go (1)
34-45: ⚠️ Potential issue | 🟠 Major

Redact the per-document prompt from debug logs.

With WithPrompt now carrying {{.Content}} into p.prompt, Line 126 still emits the full prompt body. In debug mode that will write document text/PII to logs. Please log prompt presence/length only, or a fixed redaction string instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ocr/llm_provider.go` around lines 34 - 45, The code currently logs the full
per-document prompt (p.prompt) which may contain PII; update the logging site
that references LLMProvider.prompt (triggered after WithPrompt is used) to avoid
emitting the prompt body — replace any direct inclusion of p.prompt with either
a fixed redaction string like "<redacted-prompt>" or non-sensitive metadata such
as fmt.Sprintf("prompt_len=%d", len(p.prompt)); ensure this change touches the
logging call that currently prints p.prompt (referencing LLMProvider,
WithPrompt, GetPrompt) so debug mode logs only presence/length and never the
prompt contents.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ocr.go`:
- Around line 60-65: The current guard skips applying a rendered empty OCR
prompt, preventing intentional empty templates from taking effect; change the
logic so that when renderOCRPrompt returns without error you always apply the
result (including an empty string) to the provider. Specifically, in the block
around renderOCRPrompt and ocrPrompt, keep the err check and warning path
unchanged, but replace the "else if ocrPrompt != \"\"" branch with an "else"
branch that type-asserts provider to *ocr.LLMProvider and calls
WithPrompt(ocrPrompt) regardless of its emptiness; only the error path should
fall back to the provider default.

---

Duplicate comments:
In `@ocr/llm_provider.go`:
- Around line 34-45: The code currently logs the full per-document prompt
(p.prompt) which may contain PII; update the logging site that references
LLMProvider.prompt (triggered after WithPrompt is used) to avoid emitting the
prompt body — replace any direct inclusion of p.prompt with either a fixed
redaction string like "<redacted-prompt>" or non-sensitive metadata such as
fmt.Sprintf("prompt_len=%d", len(p.prompt)); ensure this change touches the
logging call that currently prints p.prompt (referencing LLMProvider,
WithPrompt, GetPrompt) so debug mode logs only presence/length and never the
prompt contents.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b105d4b-6182-4032-947f-7b2e44d5be70

📥 Commits

Reviewing files that changed from the base of the PR and between c574a57 and e157708.

📒 Files selected for processing (5)

ocr.go
ocr/llm_provider.go
ocr/llm_provider_prompt_test.go
ocr_prompt.go
ocr_prompt_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

ocr_prompt.go
ocr_prompt_test.go

…m debug log - Remove empty-string guard so intentionally-empty templates take effect - Log prompt length instead of full content in debug output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…isting content

adamflagg added 2 commits March 6, 2026 21:09

test(ocr): add tests for per-document OCR prompt rendering

f54ee41

Red phase: tests for SetPrompt/GetPrompt on LLMProvider and renderOCRPrompt function that renders OCR templates per-document with existing document content. Ref: icereed#882

coderabbitai Bot reviewed Mar 7, 2026

View reviewed changes

Comment thread ocr.go Outdated

fix(ocr): apply rendered prompt unconditionally and redact prompt fro…

50d2eb4

…m debug log - Remove empty-string guard so intentionally-empty templates take effect - Log prompt length instead of full content in debug output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

adamflagg mentioned this pull request Mar 7, 2026

feat: chain classification after OCR with AUTO_OCR_THEN_CLASSIFY #926

Open

4 tasks

ivanzud added a commit to ivanzud/paperless-gpt that referenced this pull request Mar 8, 2026

Merge upstream PR icereed#925: render OCR prompt per document with ex…

74ac656

…isting content

ivanzud mentioned this pull request Mar 8, 2026

Resolve unresolved upstream March OCR/Ollama items ivanzud/paperless-gpt#131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ocr): add per-document prompt rendering with existing content support#925

feat(ocr): add per-document prompt rendering with existing content support#925
adamflagg wants to merge 4 commits into
icereed:mainfrom
adamflagg:feat/ocr-prompt-content

adamflagg commented Mar 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 7, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 7, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

adamflagg commented Mar 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How it works

Backward Compatibility

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adamflagg commented Mar 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 7, 2026 •

edited

Loading