Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring#11736
Draft
dougqh wants to merge 28 commits into
Draft
Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring#11736dougqh wants to merge 28 commits into
dougqh wants to merge 28 commits into
Conversation
- fast replaceAll for a fixed string & replacement, 3x throughput compared to regex based solutions, 1/2x allocation compared to regex solutions - added SubSequence which provides a view into a subsequence of a String without incurring extra allocation - Strings.spliit returns an Iterable<SubSequence> can be used to do light weight processing of a String
These benchmarks exist to show why the APIs are forbidden
…race-java into dougqh/strings-improvements
…tion) Per bric3 review: the doc said what SubSequence avoids allocating but not why it matters. Explain the use case — allocation-free lightweight parsing: substring/ subSequence copy per call, so splitting a string into many pieces on a hot path allocates O(pieces) Strings; SubSequence is a zero-copy view. Also fixes a typo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Faithful 1:1 migration of the jdbc SQLCommenterTest: all 91 cases preserved (8 getFirstWord + 68 inject + 12 base-hash + 3 peer-service). Spock `where:` tables become @ParameterizedTest + @MethodSource (the data carries embedded quotes, whitespace-significant, and null cells that @TableTest can't represent), each row led by a human-readable scenario for the display name. Config injection uses the imperative WithConfigExtension.injectSysConfig (values are parameterized). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o dougqh/sqlcommenter-getfirstword-subseq
…comparisons)
These are the ops a real classification parse needs but CharSequence lacks —
surfaced by porting SQLCommenter.getFirstWord (firstWord.startsWith("{"),
firstWord.equalsIgnoreCase("call")). Lets transient parse views be compared
without materializing a String. + JUnit 5 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… substring
getFirstWord's result is only consumed by startsWith("{") and
equalsIgnoreCase("call"), so materializing a substring on every inject() was
pure waste. Return a zero-copy SubSequence view instead and use its
startsWith/equalsIgnoreCase. Behavior is unchanged (91 SQLCommenterTest cases
pass); the test compares getFirstWord(sql).toString().
This is the motivating consumer for SubSequence.startsWith/equalsIgnoreCase
(retracted from #10640 and carried here so they land with their use case).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Benchmarks the getFirstWord first-word scan, consuming the result via the real transient pattern (startsWith, returning the boolean) so EA isn't faked away. @threads(8), @fork(2), -prof gc: the old substring allocated 48 B/op per call; the SubSequence view is fully EA-elided (~0 B/op), ~1.6x throughput. Numbers in the class Javadoc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
🟢 Java Benchmark SLOs — All performance SLOs passed
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
Contributor
Author
Benchmark results (zulu-17, JDK 17.0.7,
|
| throughput | gc.alloc.rate.norm | |
|---|---|---|
| before (substring) | 258.1M ± 21.0M ops/s | 48 B/op |
| after (SubSequence) | 508.0M ± 21.6M ops/s | ~0 B/op (10⁻⁴) |
The SubSequence view is escape-analysis–elided in the transient firstWord check, so the old substring's 48 B/op (a String + its backing array) drops to ~0 and throughput rises ~2× at @Threads(8). @Fork(5) tightens the earlier bimodal @Fork(2) spread. Benchmark Javadoc refreshed in 89ca7b8.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⛔ DO NOT REVIEW YET — blocked on #10640 (and #11735)
Draft placeholder so the work isn't lost. The diff is currently inflated: this change stacks on two not-yet-merged PRs whose contents therefore show up here —
SubSequence+startsWith/equalsIgnoreCase) — the string utils this consumes.SQLCommenterTest→ JUnit 5) — the migrated test this edits.Once both land on
masterand this is rebased, the diff reduces to just this PR's real change (thegetFirstWordport + the twoSubSequencemethods + the benchmark). Please hold review until then.What this actually does (post-rebase)
SQLCommenter.getFirstWordreturns a zero-copySubSequenceinstead ofsql.substring(...). Its result is only consumed bystartsWith("{")/equalsIgnoreCase("call")ininject, so the per-injectsubstring allocation was pure waste. The returned view carries a "transient — do not retain" warning (it pins its backingString).It also carries
SubSequence.startsWith/equalsIgnoreCase(retracted from #10640 so they land with their motivating consumer rather than as speculative API).Measured —
SQLCommenterGetFirstWordBenchmark(JDK 25,@Threads(8),@Fork(2),-prof gc)gc.alloc.rate.normsubstring)SubSequence)Escape analysis fully elides the view in the transient consumption: 48 B/op → 0, ~1.6× throughput. (The win is contingent on non-escape — confirming why the view must not be retained.)
🤖 Generated with Claude Code