Conversation
WalkthroughRefines markdown conversion: adds Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/src/markdownify/markdown.ts`:
- Around line 71-72: Normalize the href before the javascript check: after
retrieving href via node.getAttribute("href") and trimming, strip
control/whitespace characters and convert to lowercase, then test
startsWith("javascript:"); update the href handling (the variable named href) so
mixed-case or control-character-padded values like "JavaScript:" or
"java\nscript:" are normalized and blocked.
- Around line 221-236: The fixBrokenLinks function fails to recognize GFM fenced
code blocks with 3+ backticks or tilde fences; replace the simple
/(```[\s\S]*?```)/g split with a more robust fencedCodeBlock regex that matches
(?:^|\n)[ \t]{0,3}(`{3,}|~{3,})...closing backreference so it captures 3+
backticks or tildes and surrounding indentation, then split md with that
fencedCodeBlock variable and adjust the parity check from i % 2 === 1 to i % 3
=== 1 to account for the extra capture groups returned by split; keep the rest
of fixBrokenLinks logic unchanged (depth tracking and newline escaping inside
bracketed text).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bb5e94d0-ed94-4204-84fc-d529db1dfedd
📒 Files selected for processing (1)
server/src/markdownify/markdown.ts
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/src/markdownify/markdown.ts (1)
222-238:⚠️ Potential issue | 🟠 MajorHandle indented GFM fences before mutating link text.
The new splitter still misses valid fenced blocks indented by up to three spaces, so
fixBrokenLinkscan continue escaping newlines inside code. This is the same fenced-code preservation gap previously raised, just partially addressed.Proposed fix
function fixBrokenLinks(md: string): string { - const parts = md.split(/((?:^|\n)(`{3,}|~{3,})[\s\S]*?\n\2(?:\n|$))/g); + const fencedCodeBlock = /((?:^|\n)[ \t]{0,3}(`{3,}|~{3,})[^\n]*\n[\s\S]*?\n[ \t]{0,3}\2[ \t]*(?=\n|$))/g; + const parts = md.split(fencedCodeBlock); return parts.map((part, i) => { if (i % 3 === 1) return part; if (i % 3 === 2) return "";Verification script:
#!/bin/bash # Description: Demonstrate that the current splitter misses valid indented fenced code blocks. node - <<'NODE' const current = /((?:^|\n)(`{3,}|~{3,})[\s\S]*?\n\2(?:\n|$))/g; const proposed = /((?:^|\n)[ \t]{0,3}(`{3,}|~{3,})[^\n]*\n[\s\S]*?\n[ \t]{0,3}\2[ \t]*(?=\n|$))/g; const md = 'Before\n\n ```js\nconst value = "[broken\\nlink";\n ```\n\nAfter'; console.log({ currentMatchesIndentedFence: current.test(md), proposedMatchesIndentedFence: proposed.test(md), }); NODE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/src/markdownify/markdown.ts` around lines 222 - 238, fixBrokenLinks currently uses a splitter regex that fails to match fenced code blocks indented up to three spaces, so newlines inside those indented fences are still being mutated; update the splitter to a GFM-compatible regex that allows up to three leading spaces before opening and closing fences and accepts an optional info-string (for example replace the existing /((?:^|\n)(`{3,}|~{3,})[\s\S]*?\n\2(?:\n|$))/g with a pattern that matches (?:^|\n)[ \t]{0,3} followed by the opening fence and optional info, then the fence contents, and a closing fence optionally preceded by up to three spaces and followed by newline or end-of-file), so that the part-splitting in fixBrokenLinks correctly preserves indented fenced code blocks before escaping link text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/src/markdownify/markdown.ts`:
- Around line 212-216: The code inserts raw title text into HTML (variable title
-> resultHtml) without escaping, so markup in titles becomes active; fix by
HTML-escaping the title before interpolation (or create the H1 via DOM/text APIs
instead of string interpolation) when building resultHtml in the markdown
conversion code that uses title, resultHtml and $content (e.g., replace
`<h1>${title}</h1>\n${resultHtml}` with an escaped-title variant or construct a
h1 element and use its safe .text() output).
---
Duplicate comments:
In `@server/src/markdownify/markdown.ts`:
- Around line 222-238: fixBrokenLinks currently uses a splitter regex that fails
to match fenced code blocks indented up to three spaces, so newlines inside
those indented fences are still being mutated; update the splitter to a
GFM-compatible regex that allows up to three leading spaces before opening and
closing fences and accepts an optional info-string (for example replace the
existing /((?:^|\n)(`{3,}|~{3,})[\s\S]*?\n\2(?:\n|$))/g with a pattern that
matches (?:^|\n)[ \t]{0,3} followed by the opening fence and optional info, then
the fence contents, and a closing fence optionally preceded by up to three
spaces and followed by newline or end-of-file), so that the part-splitting in
fixBrokenLinks correctly preserves indented fenced code blocks before escaping
link text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 55b4b8b5-01ff-47dc-8b23-a83b01636dd7
📒 Files selected for processing (1)
server/src/markdownify/markdown.ts
| const title = $("title").text().trim() || $("h1").first().text().trim(); | ||
| let resultHtml = contentToProcess.html() || ""; | ||
| let resultHtml = $content.html() || ""; | ||
|
|
||
| if (title && !resultHtml.includes(title)) { | ||
| resultHtml = `<h1>${title}</h1>\n${resultHtml}`; |
There was a problem hiding this comment.
Escape the title before injecting it as HTML.
title is text, but Line 216 interpolates it as markup. A page title like <b>Sale</b> will be converted as an actual <b> element instead of literal title text.
Proposed fix
const title = $("title").text().trim() || $("h1").first().text().trim();
let resultHtml = $content.html() || "";
if (title && !resultHtml.includes(title)) {
- resultHtml = `<h1>${title}</h1>\n${resultHtml}`;
+ const escapedTitle = $("<div>").text(title).html() || "";
+ resultHtml = `<h1>${escapedTitle}</h1>\n${resultHtml}`;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/src/markdownify/markdown.ts` around lines 212 - 216, The code inserts
raw title text into HTML (variable title -> resultHtml) without escaping, so
markup in titles becomes active; fix by HTML-escaping the title before
interpolation (or create the H1 via DOM/text APIs instead of string
interpolation) when building resultHtml in the markdown conversion code that
uses title, resultHtml and $content (e.g., replace
`<h1>${title}</h1>\n${resultHtml}` with an escaped-title variant or construct a
h1 element and use its safe .text() output).
What this PR does?
Improves preprocessing of raw data for markdown conversion.
Summary by CodeRabbit