Skip to content

perf: use padded BitReader fast path in codestream sections#719

Open
hjanuschka wants to merge 1 commit intolibjxl:mainfrom
hjanuschka:perf/pr13-parser-bitreader-padded
Open

perf: use padded BitReader fast path in codestream sections#719
hjanuschka wants to merge 1 commit intolibjxl:mainfrom
hjanuschka:perf/pr13-parser-bitreader-padded

Conversation

@hjanuschka
Copy link
Copy Markdown
Collaborator

This adds a padded BitReader constructor and uses it for codestream sections that already have complete buffered data. It keeps incremental/incomplete paths unchanged while avoiding slow refill handling on complete section reads.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 16, 2026

Benchmark @ 2f1c348

MULTI-FILE BENCHMARK RESULTS (8 files)
  CPU architecture: x86_64
  WARNING: System appears noisy: high system load (2.45). Results may be unreliable.
Statistics:
  Confidence:               99.0%
  Max relative error:        3.0%

Comparing: 7cf3a662 (Base) vs ff54ec06 (PR)

File Base (MP/s) PR (MP/s) Δ%
bicycles.jxl 8.350 8.475 +1.49% ±0.8%
bike.jxl 27.093 27.137 +0.16% ±1.0%
delta_palette.jxl 6.827 6.830 +0.04% ±0.8%
green_queen_modular_e3.jxl 8.950 8.845 -1.17% ±0.6%
green_queen_vardct_e3.jxl 25.930 25.742 -0.73% ±1.8%
lz77_flower.jxl 3.691 3.693 +0.05% ±1.1%
patches_lossless.jxl 3.195 3.182 -0.41% ±1.9%
sunset_logo.jxl 3.187 3.294 +3.37% ±1.4%

Copy link
Copy Markdown
Member

@veluca93 veluca93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer adding a SectionBuffer type, which wraps a vector while keeping 8 bytes of additional space (and pretending the vector is shorter).

This should significantly simplify the logic.

Also, note that the code as it currently is breaks out-of-bounds detection.

@hjanuschka hjanuschka force-pushed the perf/pr13-parser-bitreader-padded branch from a3453d2 to 0c6045a Compare April 2, 2026 08:09
Rework per review feedback: instead of appending 8 zero bytes at dequeue
time and tracking real_len separately, SectionBuffer now always allocates
len + 8 bytes with zero padding built in.

- Add SectionBuffer::ensure_allocated() and bit_reader() methods
- Add BitReader::new_with_initial_bits() for padded data with OOB
  detection based on real data length
- I/O writes bounded to buf.len (real data), never touching padding
- Incomplete/flush paths still use regular BitReader::new()

The zero padding ensures refill() always takes the fast 8-byte LE read
path, avoiding refill_slow() for small/tail sections.
@hjanuschka hjanuschka force-pushed the perf/pr13-parser-bitreader-padded branch from 0c6045a to 2f1c348 Compare April 2, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants