Skip to content

test(walker): add byte-equal parity corpus for storageToMarkdown (#138)#159

Merged
pchuri merged 2 commits into
mainfrom
test/walker-parity-corpus-138
May 1, 2026
Merged

test(walker): add byte-equal parity corpus for storageToMarkdown (#138)#159
pchuri merged 2 commits into
mainfrom
test/walker-parity-corpus-138

Conversation

@pchuri

@pchuri pchuri commented May 1, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a parity corpus (tests/fixtures/storage-samples/*.xml + .expected.md) and tests/storage-walker-parity.test.js that pins the current storageToMarkdown output for 11 representative storage samples. Future structural refactors of the walker have to face every dispatch path in CI on PR open instead of needing a human reviewer to spot the regression.

Closes #138.

Why

The regex → walker migration in #137 went through three review rounds and each round surfaced the same shape of regression: an entity-decoding path the walker had silently dropped (text nodes, attributes, CDATA bodies, params, URLs). Subsequent walker fixes (#143, #152, #154, #156, #157) repeated the pattern. A pinned-output corpus closes that loop — the next time someone restructures the walker, every path is exercised and any output drift surfaces in the diff of one of the .expected.md files.

Corpus coverage

Each fixture is one coherent group of paths that travel together, sized so a .expected.md diff is reviewable.

Fixture Paths exercised
01-prose-and-entities h1–h6, p, strong/em, s/del, code inline, br, hr, a, time, named/ASCII-map/numeric entity decoding
02-callouts info / warning / note macros + empty-body callout
03-expand-and-anchor expand with title, expand without title (details/summary), anchor with id
04-code-and-mermaid code with language, code without language, fence escape for embedded backticks, mermaid
05-panels panel title+body, body-only, title-only
06-includes-and-shared-blocks include for shared/personal space, shared-block with body, include-shared-block with page link, escaped markdown specials in title
07-tables table with formatting in cells, br collapse, nbsp cell
08-lists-and-tasks ul, ol, nested ul, task list (complete / incomplete / empty)
09-images-and-files ri:attachment, ri:url, no-source drop, view-file
10-links ac:link with ac:anchor, with ri:url, with ac:link-body, with ri:page only, escaped markdown specials in title
11-layouts-and-edge-cases ac:layout/section/cell, toc/floatmenu drop, mixed CDATA + entities, blockquote, unknown-tag fallback

Regenerating after an intentional output change

```
WRITE_PARITY_EXPECTED=1 npx jest tests/storage-walker-parity.test.js
```

The diff in .expected.md becomes part of the PR for review.

Test plan

  • WRITE_PARITY_EXPECTED=1 npx jest tests/storage-walker-parity.test.js generates 11 expected files
  • npx jest tests/storage-walker-parity.test.js (no env) → 11 / 11 pass byte-equal
  • npm test → 578 / 578 pass
  • npm run lint clean

pchuri added 2 commits May 1, 2026 14:46
Add tests/fixtures/storage-samples/ with 11 storage XML samples paired
with checked-in .expected.md files. tests/storage-walker-parity.test.js
iterates the corpus and asserts byte-equal output, so future structural
refactors of the walker face every dispatch path on PR open instead of
needing a human reviewer to spot a regression.

Background: the regex → walker migration in #137 took three review
rounds because each round surfaced one more entity-decoding path the
walker had silently dropped. Subsequent fixes (#143, #152, #154, #156,
#157) repeated the pattern. The corpus pins the current behaviour for
each handler so the next refactor can iterate against CI.

Corpus coverage:
- prose, headings, inline formatting, entities (named, ASCII map, numeric)
- callouts (info / warning / note + empty body)
- expand (with title, without title), anchor
- code (lang / no-lang / fence escape), mermaid
- panel variants (title+body, body-only, title-only)
- include (shared / personal space), shared-block, include-shared-block
- tables with formatting and br collapse
- lists (ul / ol / nested) and task lists
- images (ri:attachment / ri:url / no source) and view-file
- ac:link variants (anchor, ri:url, link-body, ri:page)
- layouts, CDATA + entities, toc/floatmenu drop, blockquote, unknown tag

To regenerate after an intentional output change:
  WRITE_PARITY_EXPECTED=1 npx jest tests/storage-walker-parity.test.js
Address four issues from self-review of #159:

- cleanupWithFences ends with .trim(), so walker output has no trailing
  newline. Editors / pre-commit hooks that auto-add a final newline on
  save would silently break byte-equal compare. Now serialize and
  compare with a final '\n', keeping the .expected.md POSIX-compliant.
- Test name was "round-trips to its pinned markdown" — confusing,
  storageToMarkdown is one-way. Rename to "matches its pinned markdown".
- WRITE_PARITY_EXPECTED=1 silently rewrote the corpus and passed without
  asserting. A stray env var in a shell rc could overwrite the corpus
  during CI. Now log a console.warn so the regenerate path is visible.
- Three handler paths were not exercised: handleMacro default branch
  (unknown macro name), handleAcLink fallback drop (empty <ac:link/>),
  handleImage drop on empty ri:url value. Extend fixtures 09 / 10 / 11.
@pchuri pchuri merged commit 010f11b into main May 1, 2026
6 checks passed
@pchuri pchuri deleted the test/walker-parity-corpus-138 branch May 1, 2026 05:55
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

🎉 This PR is included in version 2.1.11 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add byte-equal round-trip parity test against pre-walker output

1 participant