Skip to content

fix(walker): escape markdown special chars in ri:content-title#156

Merged
pchuri merged 1 commit into
mainfrom
fix/issue-143-escape-content-title
May 1, 2026
Merged

fix(walker): escape markdown special chars in ri:content-title#156
pchuri merged 1 commit into
mainfrom
fix/issue-143-escape-content-title

Conversation

@pchuri

@pchuri pchuri commented Apr 30, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #143.

ri:content-title is interpolated directly into markdown link syntax. If a title contains (, ), [, ], or \, the link breaks or — for an adversarially-crafted title like evil) [pwn](http://attacker.com — allows injection of an unintended sibling link. This is pre-existing on main (the regex pipeline didn't escape either), but it's a real correctness bug for any Confluence page whose title legitimately contains parentheses.

Changes

  • Added escapeMarkdownText(s) helper that escapes \ [ ] ( ).
  • Applied at the three sites that splice ri:content-title into markdown:
    • handleAcLink[title] form for <ri:page> without link body.
    • handleInclude — link text in include-page output. URL path still uses the unescaped title via encodeURIComponent, so URL encoding is unaffected.
    • handleSharedBlockpageTitle interpolated into prose between surrounding parens.

Out of scope: ri:filename (handleViewFile / handleImage) and the generic <a> link path. Those are pre-existing concerns the issue did not include.

Test plan

  • New tests cover parentheses, brackets, backslash, the include-page macro path (verifying URL encoding is preserved), include-shared-block prose, and the adversarial sibling-link injection case.
  • npx jest — 560/560 passing.
  • npx eslint — clean on changed files.

Confluence page titles routinely contain `()` (e.g. "Release notes (v2.0)")
and could in theory contain `[]` or `\`. Interpolating them raw into markdown
link syntax breaks downstream parsers and — for an adversarially-crafted title
— allows a sibling link to be injected.

Add an `escapeMarkdownText` helper and apply it at the three sites that splice
`ri:content-title` into markdown: `handleAcLink`'s `[title]` form,
`handleInclude`'s link text (URL still uses the unescaped title via
`encodeURIComponent`), and `handleSharedBlock`'s prose interpolation.
@pchuri pchuri merged commit 3e3c53e into main May 1, 2026
6 checks passed
@pchuri pchuri deleted the fix/issue-143-escape-content-title branch May 1, 2026 02:15
github-actions Bot pushed a commit that referenced this pull request May 1, 2026
## [2.1.9](v2.1.8...v2.1.9) (2026-05-01)

### Bug Fixes

* **walker:** escape markdown special chars in ri:content-title ([#143](#143)) ([#156](#156)) ([3e3c53e](3e3c53e))
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

🎉 This PR is included in version 2.1.9 🎉

The release is available on:

Your semantic-release bot 📦🚀

pchuri added a commit that referenced this pull request May 1, 2026
… (#159)

* test(walker): add byte-equal parity corpus for storageToMarkdown (#138)

Add tests/fixtures/storage-samples/ with 11 storage XML samples paired
with checked-in .expected.md files. tests/storage-walker-parity.test.js
iterates the corpus and asserts byte-equal output, so future structural
refactors of the walker face every dispatch path on PR open instead of
needing a human reviewer to spot a regression.

Background: the regex → walker migration in #137 took three review
rounds because each round surfaced one more entity-decoding path the
walker had silently dropped. Subsequent fixes (#143, #152, #154, #156,
#157) repeated the pattern. The corpus pins the current behaviour for
each handler so the next refactor can iterate against CI.

Corpus coverage:
- prose, headings, inline formatting, entities (named, ASCII map, numeric)
- callouts (info / warning / note + empty body)
- expand (with title, without title), anchor
- code (lang / no-lang / fence escape), mermaid
- panel variants (title+body, body-only, title-only)
- include (shared / personal space), shared-block, include-shared-block
- tables with formatting and br collapse
- lists (ul / ol / nested) and task lists
- images (ri:attachment / ri:url / no source) and view-file
- ac:link variants (anchor, ri:url, link-body, ri:page)
- layouts, CDATA + entities, toc/floatmenu drop, blockquote, unknown tag

To regenerate after an intentional output change:
  WRITE_PARITY_EXPECTED=1 npx jest tests/storage-walker-parity.test.js

* test(walker): self-review fixes for parity corpus (#138)

Address four issues from self-review of #159:

- cleanupWithFences ends with .trim(), so walker output has no trailing
  newline. Editors / pre-commit hooks that auto-add a final newline on
  save would silently break byte-equal compare. Now serialize and
  compare with a final '\n', keeping the .expected.md POSIX-compliant.
- Test name was "round-trips to its pinned markdown" — confusing,
  storageToMarkdown is one-way. Rename to "matches its pinned markdown".
- WRITE_PARITY_EXPECTED=1 silently rewrote the corpus and passed without
  asserting. A stray env var in a shell rc could overwrite the corpus
  during CI. Now log a console.warn so the regenerate path is visible.
- Three handler paths were not exercised: handleMacro default branch
  (unknown macro name), handleAcLink fallback drop (empty <ac:link/>),
  handleImage drop on empty ri:url value. Extend fixtures 09 / 10 / 11.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ri:content-title containing markdown special chars breaks link syntax

1 participant