Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring by dougqh · Pull Request #11736 · DataDog/dd-trace-java

dougqh · 2026-06-24T22:15:19Z

⛔ DO NOT REVIEW YET — blocked on #10640 (and #11735)

Draft placeholder so the work isn't lost. The diff is currently inflated: this change stacks on two not-yet-merged PRs whose contents therefore show up here —

Adding lighter String processing methods to Strings #10640 (SubSequence + startsWith/equalsIgnoreCase) — the string utils this consumes.
Migrate SQLCommenterTest from Groovy/Spock to JUnit 5 Java #11735 (SQLCommenterTest → JUnit 5) — the migrated test this edits.

Once both land on master and this is rebased, the diff reduces to just this PR's real change (the getFirstWord port + the two SubSequence methods + the benchmark). Please hold review until then.

What this actually does (post-rebase)

SQLCommenter.getFirstWord returns a zero-copy SubSequence instead of sql.substring(...). Its result is only consumed by startsWith("{") / equalsIgnoreCase("call") in inject, so the per-inject substring allocation was pure waste. The returned view carries a "transient — do not retain" warning (it pins its backing String).

It also carries SubSequence.startsWith/equalsIgnoreCase (retracted from #10640 so they land with their motivating consumer rather than as speculative API).

Measured — `SQLCommenterGetFirstWordBenchmark` (JDK 25, `@Threads(8)`, `@Fork(2)`, `-prof gc`)

	throughput	`gc.alloc.rate.norm`
before (`substring`)	234.8M ± 23.2M ops/s	48 B/op
after (`SubSequence`)	378.2M ± 30.4M ops/s	≈ 0

Escape analysis fully elides the view in the transient consumption: 48 B/op → 0, ~1.6× throughput. (The win is contingent on non-escape — confirming why the view must not be retained.)

🤖 Generated with Claude Code

- fast replaceAll for a fixed string & replacement, 3x throughput compared to regex based solutions, 1/2x allocation compared to regex solutions - added SubSequence which provides a view into a subsequence of a String without incurring extra allocation - Strings.spliit returns an Iterable<SubSequence> can be used to do light weight processing of a String

These benchmarks exist to show why the APIs are forbidden

…race-java into dougqh/strings-improvements

…tion) Per bric3 review: the doc said what SubSequence avoids allocating but not why it matters. Explain the use case — allocation-free lightweight parsing: substring/ subSequence copy per call, so splitting a string into many pieces on a hot path allocates O(pieces) Strings; SubSequence is a zero-copy view. Also fixes a typo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ovements

@TableTest

Faithful 1:1 migration of the jdbc SQLCommenterTest: all 91 cases preserved (8 getFirstWord + 68 inject + 12 base-hash + 3 peer-service). Spock `where:` tables become @ParameterizedTest + @MethodSource (the data carries embedded quotes, whitespace-significant, and null cells that @TableTest can't represent), each row led by a human-readable scenario for the display name. Config injection uses the imperative WithConfigExtension.injectSysConfig (values are parameterized). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…o dougqh/sqlcommenter-getfirstword-subseq

…comparisons) These are the ops a real classification parse needs but CharSequence lacks — surfaced by porting SQLCommenter.getFirstWord (firstWord.startsWith("{"), firstWord.equalsIgnoreCase("call")). Lets transient parse views be compared without materializing a String. + JUnit 5 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… substring getFirstWord's result is only consumed by startsWith("{") and equalsIgnoreCase("call"), so materializing a substring on every inject() was pure waste. Return a zero-copy SubSequence view instead and use its startsWith/equalsIgnoreCase. Behavior is unchanged (91 SQLCommenterTest cases pass); the test compares getFirstWord(sql).toString(). This is the motivating consumer for SubSequence.startsWith/equalsIgnoreCase (retracted from #10640 and carried here so they land with their use case). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@threads

Benchmarks the getFirstWord first-word scan, consuming the result via the real transient pattern (startsWith, returning the boolean) so EA isn't faked away. @threads(8), @fork(2), -prof gc: the old substring allocated 48 B/op per call; the SubSequence view is fully EA-elided (~0 B/op), ~1.6x throughput. Numbers in the class Javadoc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dd-octo-sts · 2026-06-24T22:38:39Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	14.02 s	13.90 s	[+0.1%; +1.6%] (maybe worse)
startup:insecure-bank:tracing:Agent	12.89 s	13.00 s	[-1.9%; +0.1%] (no difference)
startup:petclinic:appsec:Agent	16.86 s	16.74 s	[-0.2%; +1.7%] (no difference)
startup:petclinic:iast:Agent	16.90 s	17.05 s	[-1.6%; -0.2%] (maybe better)
startup:petclinic:profiling:Agent	16.85 s	16.86 s	[-1.3%; +1.2%] (no difference)
startup:petclinic:sca:Agent	16.95 s	16.12 s	[+0.8%; +9.4%] (maybe worse)
startup:petclinic:tracing:Agent	16.08 s	16.15 s	[-1.4%; +0.5%] (no difference)

Commit: 89ca7b84 · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

zulu-17 @fork(5), -prof gc: 258.1M -> 508.0M ops/s (~2x), 48 -> ~0 B/op. @fork(5) tightens the earlier bimodal @fork(2) spread. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dougqh · 2026-06-25T11:35:28Z

Benchmark results (zulu-17, JDK 17.0.7, `@Threads(8)`, `@Fork(5)`, `-prof gc`)

SQLCommenterGetFirstWordBenchmark.firstWordCheck:

	throughput	gc.alloc.rate.norm
before (substring)	258.1M ± 21.0M ops/s	48 B/op
after (SubSequence)	508.0M ± 21.6M ops/s	~0 B/op (10⁻⁴)

The SubSequence view is escape-analysis–elided in the transient firstWord check, so the old substring's 48 B/op (a String + its backing array) drops to ~0 and throughput rises ~2× at @Threads(8). @Fork(5) tightens the earlier bimodal @Fork(2) spread. Benchmark Javadoc refreshed in 89ca7b8.

dougqh and others added 27 commits February 19, 2026 10:34

Checking hashCode

b0a04fc

Reducing whitespace

f75da1c

Fixing ugly spotless formatting

ab70701

Fixing ugly spotless formatting

fb3ae00

Renaming benchmark method

a71b449

Clarifying Javadoc

aa070b4

grammar

fa1b8bd

Allowing forbidden API usage

2822711

These benchmarks exist to show why the APIs are forbidden

Merge branch 'master' into dougqh/strings-improvements

6f9ca53

Merge branch 'master' into dougqh/strings-improvements

d6623eb

Adding String.replace to the benchmark

1beb704

grammar

f516f32

Added comment explaining implementation approach

a149be6

Clean-up imports

479a3e0

spotless

c03a85a

Merge branch 'master' into dougqh/strings-improvements

018496b

Fixing package after moving code from stand-alone reproduction

0bfdbd4

Merge branch 'dougqh/strings-improvements' of github.com:DataDog/dd-t…

80b454c

…race-java into dougqh/strings-improvements

Merge remote-tracking branch 'origin/master' into dougqh/strings-impr…

db6ebcd

…ovements

Merge branch 'master' into dougqh/strings-improvements

7ba4558

Merge remote-tracking branch 'origin/dougqh/strings-improvements' int…

3f40dc0

…o dougqh/sqlcommenter-getfirstword-subseq

dougqh added tag: performance Performance related changes tag: no release notes Changes to exclude from release notes type: refactoring labels Jun 24, 2026

dougqh added inst: jdbc JDBC instrumentation tag: ai generated Largely based on code generated by an AI or LLM labels Jun 24, 2026

Refresh SQLCommenterGetFirstWordBenchmark results to JDK 17 @fork(5)

89ca7b8

zulu-17 @fork(5), -prof gc: 258.1M -> 508.0M ops/s (~2x), 48 -> ~0 B/op. @fork(5) tightens the earlier bimodal @fork(2) spread. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring#11736

Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring#11736
dougqh wants to merge 28 commits into
masterfrom
dougqh/sqlcommenter-getfirstword-subseq

dougqh commented Jun 24, 2026

Uh oh!

dd-octo-sts Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

dougqh commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dougqh commented Jun 24, 2026

⛔ DO NOT REVIEW YET — blocked on #10640 (and #11735)

What this actually does (post-rebase)

Measured — SQLCommenterGetFirstWordBenchmark (JDK 25, @Threads(8), @Fork(2), -prof gc)

Uh oh!

dd-octo-sts Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

dougqh commented Jun 25, 2026

Benchmark results (zulu-17, JDK 17.0.7, @Threads(8), @Fork(5), -prof gc)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Measured — `SQLCommenterGetFirstWordBenchmark` (JDK 25, `@Threads(8)`, `@Fork(2)`, `-prof gc`)

dd-octo-sts Bot commented Jun 24, 2026 •

edited

Loading

Benchmark results (zulu-17, JDK 17.0.7, `@Threads(8)`, `@Fork(5)`, `-prof gc`)