Skip to content

Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring#11736

Draft
dougqh wants to merge 28 commits into
masterfrom
dougqh/sqlcommenter-getfirstword-subseq
Draft

Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring#11736
dougqh wants to merge 28 commits into
masterfrom
dougqh/sqlcommenter-getfirstword-subseq

Conversation

@dougqh

@dougqh dougqh commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

⛔ DO NOT REVIEW YET — blocked on #10640 (and #11735)

Draft placeholder so the work isn't lost. The diff is currently inflated: this change stacks on two not-yet-merged PRs whose contents therefore show up here —

Once both land on master and this is rebased, the diff reduces to just this PR's real change (the getFirstWord port + the two SubSequence methods + the benchmark). Please hold review until then.

What this actually does (post-rebase)

SQLCommenter.getFirstWord returns a zero-copy SubSequence instead of sql.substring(...). Its result is only consumed by startsWith("{") / equalsIgnoreCase("call") in inject, so the per-inject substring allocation was pure waste. The returned view carries a "transient — do not retain" warning (it pins its backing String).

It also carries SubSequence.startsWith/equalsIgnoreCase (retracted from #10640 so they land with their motivating consumer rather than as speculative API).

Measured — SQLCommenterGetFirstWordBenchmark (JDK 25, @Threads(8), @Fork(2), -prof gc)

throughput gc.alloc.rate.norm
before (substring) 234.8M ± 23.2M ops/s 48 B/op
after (SubSequence) 378.2M ± 30.4M ops/s ≈ 0

Escape analysis fully elides the view in the transient consumption: 48 B/op → 0, ~1.6× throughput. (The win is contingent on non-escape — confirming why the view must not be retained.)

🤖 Generated with Claude Code

dougqh and others added 27 commits February 19, 2026 10:34
- fast replaceAll for a fixed string & replacement, 3x throughput compared to regex based solutions, 1/2x allocation compared to regex solutions

- added SubSequence which provides a view into a subsequence of a String without incurring extra allocation

- Strings.spliit returns an Iterable<SubSequence> can be used to do light weight processing of a String
These benchmarks exist to show why the APIs are forbidden
…tion)

Per bric3 review: the doc said what SubSequence avoids allocating but not why it
matters. Explain the use case — allocation-free lightweight parsing: substring/
subSequence copy per call, so splitting a string into many pieces on a hot path
allocates O(pieces) Strings; SubSequence is a zero-copy view. Also fixes a typo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Faithful 1:1 migration of the jdbc SQLCommenterTest: all 91 cases preserved
(8 getFirstWord + 68 inject + 12 base-hash + 3 peer-service). Spock `where:`
tables become @ParameterizedTest + @MethodSource (the data carries embedded
quotes, whitespace-significant, and null cells that @TableTest can't represent),
each row led by a human-readable scenario for the display name. Config injection
uses the imperative WithConfigExtension.injectSysConfig (values are parameterized).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…comparisons)

These are the ops a real classification parse needs but CharSequence lacks —
surfaced by porting SQLCommenter.getFirstWord (firstWord.startsWith("{"),
firstWord.equalsIgnoreCase("call")). Lets transient parse views be compared
without materializing a String. + JUnit 5 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… substring

getFirstWord's result is only consumed by startsWith("{") and
equalsIgnoreCase("call"), so materializing a substring on every inject() was
pure waste. Return a zero-copy SubSequence view instead and use its
startsWith/equalsIgnoreCase. Behavior is unchanged (91 SQLCommenterTest cases
pass); the test compares getFirstWord(sql).toString().

This is the motivating consumer for SubSequence.startsWith/equalsIgnoreCase
(retracted from #10640 and carried here so they land with their use case).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Benchmarks the getFirstWord first-word scan, consuming the result via the real
transient pattern (startsWith, returning the boolean) so EA isn't faked away.
@threads(8), @fork(2), -prof gc: the old substring allocated 48 B/op per call;
the SubSequence view is fully EA-elided (~0 B/op), ~1.6x throughput. Numbers in
the class Javadoc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dougqh dougqh added tag: performance Performance related changes tag: no release notes Changes to exclude from release notes type: refactoring labels Jun 24, 2026
@dougqh dougqh added inst: jdbc JDBC instrumentation tag: ai generated Largely based on code generated by an AI or LLM labels Jun 24, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 14.02 s 13.90 s [+0.1%; +1.6%] (maybe worse)
startup:insecure-bank:tracing:Agent 12.89 s 13.00 s [-1.9%; +0.1%] (no difference)
startup:petclinic:appsec:Agent 16.86 s 16.74 s [-0.2%; +1.7%] (no difference)
startup:petclinic:iast:Agent 16.90 s 17.05 s [-1.6%; -0.2%] (maybe better)
startup:petclinic:profiling:Agent 16.85 s 16.86 s [-1.3%; +1.2%] (no difference)
startup:petclinic:sca:Agent 16.95 s 16.12 s [+0.8%; +9.4%] (maybe worse)
startup:petclinic:tracing:Agent 16.08 s 16.15 s [-1.4%; +0.5%] (no difference)

Commit: 89ca7b84 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

zulu-17 @fork(5), -prof gc: 258.1M -> 508.0M ops/s (~2x), 48 -> ~0 B/op.
@fork(5) tightens the earlier bimodal @fork(2) spread.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dougqh

dougqh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Benchmark results (zulu-17, JDK 17.0.7, @Threads(8), @Fork(5), -prof gc)

SQLCommenterGetFirstWordBenchmark.firstWordCheck:

throughput gc.alloc.rate.norm
before (substring) 258.1M ± 21.0M ops/s 48 B/op
after (SubSequence) 508.0M ± 21.6M ops/s ~0 B/op (10⁻⁴)

The SubSequence view is escape-analysis–elided in the transient firstWord check, so the old substring's 48 B/op (a String + its backing array) drops to ~0 and throughput rises ~2× at @Threads(8). @Fork(5) tightens the earlier bimodal @Fork(2) spread. Benchmark Javadoc refreshed in 89ca7b8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: jdbc JDBC instrumentation tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes tag: performance Performance related changes type: refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant