Adding lighter String processing methods to Strings#10640
Conversation
- fast replaceAll for a fixed string & replacement, 3x throughput compared to regex based solutions, 1/2x allocation compared to regex solutions - added SubSequence which provides a view into a subsequence of a String without incurring extra allocation - Strings.spliit returns an Iterable<SubSequence> can be used to do light weight processing of a String
BenchmarksStartupParameters
See matching parameters
SummaryFound 1 performance improvements and 1 performance regressions! Performance is the same for 59 metrics, 10 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.066 s) : 0, 1065574
Total [baseline] (8.764 s) : 0, 8763519
Agent [candidate] (1.067 s) : 0, 1066788
Total [candidate] (8.757 s) : 0, 8757483
section iast
Agent [baseline] (1.245 s) : 0, 1244697
Total [baseline] (9.368 s) : 0, 9368390
Agent [candidate] (1.229 s) : 0, 1229311
Total [candidate] (9.379 s) : 0, 9378595
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.206 ms) : 0, 1206
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (627.517 ms) : 0, 627517
BytebuddyAgent [candidate] (628.677 ms) : 0, 628677
AgentMeter [baseline] (29.248 ms) : 0, 29248
AgentMeter [candidate] (29.072 ms) : 0, 29072
GlobalTracer [baseline] (258.78 ms) : 0, 258780
GlobalTracer [candidate] (257.763 ms) : 0, 257763
AppSec [baseline] (33.328 ms) : 0, 33328
AppSec [candidate] (32.966 ms) : 0, 32966
Debugger [baseline] (61.009 ms) : 0, 61009
Debugger [candidate] (65.733 ms) : 0, 65733
Remote Config [baseline] (646.378 µs) : 0, 646
Remote Config [candidate] (600.278 µs) : 0, 600
Telemetry [baseline] (12.258 ms) : 0, 12258
Telemetry [candidate] (9.979 ms) : 0, 9979
Flare Poller [baseline] (5.398 ms) : 0, 5398
Flare Poller [candidate] (4.595 ms) : 0, 4595
section iast
crashtracking [baseline] (1.229 ms) : 0, 1229
crashtracking [candidate] (1.192 ms) : 0, 1192
BytebuddyAgent [baseline] (807.466 ms) : 0, 807466
BytebuddyAgent [candidate] (794.468 ms) : 0, 794468
AgentMeter [baseline] (11.527 ms) : 0, 11527
AgentMeter [candidate] (11.319 ms) : 0, 11319
GlobalTracer [baseline] (248.466 ms) : 0, 248466
GlobalTracer [candidate] (246.98 ms) : 0, 246980
IAST [baseline] (27.189 ms) : 0, 27189
IAST [candidate] (27.138 ms) : 0, 27138
AppSec [baseline] (33.537 ms) : 0, 33537
AppSec [candidate] (34.206 ms) : 0, 34206
Debugger [baseline] (66.221 ms) : 0, 66221
Debugger [candidate] (65.342 ms) : 0, 65342
Remote Config [baseline] (535.424 µs) : 0, 535
Remote Config [candidate] (535.886 µs) : 0, 536
Telemetry [baseline] (8.687 ms) : 0, 8687
Telemetry [candidate] (8.624 ms) : 0, 8624
Flare Poller [baseline] (3.448 ms) : 0, 3448
Flare Poller [candidate] (3.454 ms) : 0, 3454
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1063550
Total [baseline] (10.782 s) : 0, 10782453
Agent [candidate] (1.073 s) : 0, 1073422
Total [candidate] (11.01 s) : 0, 11010175
section appsec
Agent [baseline] (1.241 s) : 0, 1240843
Total [baseline] (11.076 s) : 0, 11075730
Agent [candidate] (1.253 s) : 0, 1253056
Total [candidate] (11.099 s) : 0, 11098613
section iast
Agent [baseline] (1.231 s) : 0, 1230908
Total [baseline] (11.209 s) : 0, 11209226
Agent [candidate] (1.23 s) : 0, 1230030
Total [candidate] (11.19 s) : 0, 11190242
section profiling
Agent [baseline] (1.193 s) : 0, 1193105
Total [baseline] (11.023 s) : 0, 11023235
Agent [candidate] (1.198 s) : 0, 1198030
Total [candidate] (10.948 s) : 0, 10947792
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.181 ms) : 0, 1181
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (626.725 ms) : 0, 626725
BytebuddyAgent [candidate] (631.737 ms) : 0, 631737
AgentMeter [baseline] (29.046 ms) : 0, 29046
AgentMeter [candidate] (29.35 ms) : 0, 29350
GlobalTracer [baseline] (257.328 ms) : 0, 257328
GlobalTracer [candidate] (259.221 ms) : 0, 259221
AppSec [baseline] (32.968 ms) : 0, 32968
AppSec [candidate] (33.329 ms) : 0, 33329
Debugger [baseline] (65.037 ms) : 0, 65037
Debugger [candidate] (66.23 ms) : 0, 66230
Remote Config [baseline] (612.862 µs) : 0, 613
Remote Config [candidate] (607.301 µs) : 0, 607
Telemetry [baseline] (10.755 ms) : 0, 10755
Telemetry [candidate] (10.777 ms) : 0, 10777
Flare Poller [baseline] (3.789 ms) : 0, 3789
Flare Poller [candidate] (4.643 ms) : 0, 4643
section appsec
crashtracking [baseline] (1.195 ms) : 0, 1195
crashtracking [candidate] (1.224 ms) : 0, 1224
BytebuddyAgent [baseline] (658.282 ms) : 0, 658282
BytebuddyAgent [candidate] (666.751 ms) : 0, 666751
AgentMeter [baseline] (12.029 ms) : 0, 12029
AgentMeter [candidate] (12.121 ms) : 0, 12121
GlobalTracer [baseline] (258.706 ms) : 0, 258706
GlobalTracer [candidate] (261.076 ms) : 0, 261076
AppSec [baseline] (168.147 ms) : 0, 168147
AppSec [candidate] (168.941 ms) : 0, 168941
Debugger [baseline] (66.998 ms) : 0, 66998
Debugger [candidate] (67.089 ms) : 0, 67089
Remote Config [baseline] (684.151 µs) : 0, 684
Remote Config [candidate] (710.071 µs) : 0, 710
Telemetry [baseline] (9.475 ms) : 0, 9475
Telemetry [candidate] (9.318 ms) : 0, 9318
Flare Poller [baseline] (3.696 ms) : 0, 3696
Flare Poller [candidate] (3.682 ms) : 0, 3682
IAST [baseline] (25.415 ms) : 0, 25415
IAST [candidate] (25.707 ms) : 0, 25707
section iast
crashtracking [baseline] (1.191 ms) : 0, 1191
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (794.443 ms) : 0, 794443
BytebuddyAgent [candidate] (794.374 ms) : 0, 794374
AgentMeter [baseline] (11.33 ms) : 0, 11330
AgentMeter [candidate] (11.311 ms) : 0, 11311
GlobalTracer [baseline] (247.017 ms) : 0, 247017
GlobalTracer [candidate] (246.942 ms) : 0, 246942
AppSec [baseline] (35.083 ms) : 0, 35083
AppSec [candidate] (34.124 ms) : 0, 34124
Debugger [baseline] (65.78 ms) : 0, 65780
Debugger [candidate] (66.499 ms) : 0, 66499
Remote Config [baseline] (550.831 µs) : 0, 551
Remote Config [candidate] (529.664 µs) : 0, 530
Telemetry [baseline] (8.715 ms) : 0, 8715
Telemetry [candidate] (8.639 ms) : 0, 8639
Flare Poller [baseline] (3.45 ms) : 0, 3450
Flare Poller [candidate] (3.439 ms) : 0, 3439
IAST [baseline] (27.203 ms) : 0, 27203
IAST [candidate] (26.891 ms) : 0, 26891
section profiling
ProfilingAgent [baseline] (99.685 ms) : 0, 99685
ProfilingAgent [candidate] (100.641 ms) : 0, 100641
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.172 ms) : 0, 1172
BytebuddyAgent [baseline] (683.535 ms) : 0, 683535
BytebuddyAgent [candidate] (685.599 ms) : 0, 685599
AgentMeter [baseline] (8.584 ms) : 0, 8584
AgentMeter [candidate] (8.654 ms) : 0, 8654
GlobalTracer [baseline] (216.035 ms) : 0, 216035
GlobalTracer [candidate] (216.85 ms) : 0, 216850
AppSec [baseline] (32.663 ms) : 0, 32663
AppSec [candidate] (32.702 ms) : 0, 32702
Debugger [baseline] (67.173 ms) : 0, 67173
Debugger [candidate] (66.882 ms) : 0, 66882
Remote Config [baseline] (634.793 µs) : 0, 635
Remote Config [candidate] (639.285 µs) : 0, 639
Telemetry [baseline] (8.944 ms) : 0, 8944
Telemetry [candidate] (9.835 ms) : 0, 9835
Flare Poller [baseline] (3.727 ms) : 0, 3727
Flare Poller [candidate] (3.791 ms) : 0, 3791
Profiling [baseline] (100.276 ms) : 0, 100276
Profiling [candidate] (101.22 ms) : 0, 101220
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 2 performance regressions! Performance is the same for 16 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section baseline
no_agent (18.174 ms) : 17986, 18363
. : milestone, 18174,
appsec (19.402 ms) : 19203, 19601
. : milestone, 19402,
code_origins (17.579 ms) : 17402, 17755
. : milestone, 17579,
iast (18.89 ms) : 18699, 19082
. : milestone, 18890,
profiling (18.746 ms) : 18560, 18931
. : milestone, 18746,
tracing (17.616 ms) : 17445, 17787
. : milestone, 17616,
section candidate
no_agent (18.898 ms) : 18703, 19093
. : milestone, 18898,
appsec (18.674 ms) : 18488, 18859
. : milestone, 18674,
code_origins (17.35 ms) : 17179, 17521
. : milestone, 17350,
iast (18.025 ms) : 17846, 18204
. : milestone, 18025,
profiling (18.718 ms) : 18530, 18906
. : milestone, 18718,
tracing (17.701 ms) : 17523, 17879
. : milestone, 17701,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section baseline
no_agent (1.174 ms) : 1162, 1185
. : milestone, 1174,
iast (3.214 ms) : 3173, 3256
. : milestone, 3214,
iast_FULL (5.892 ms) : 5832, 5952
. : milestone, 5892,
iast_GLOBAL (3.396 ms) : 3354, 3438
. : milestone, 3396,
profiling (2.114 ms) : 2095, 2133
. : milestone, 2114,
tracing (1.819 ms) : 1804, 1835
. : milestone, 1819,
section candidate
no_agent (1.165 ms) : 1154, 1177
. : milestone, 1165,
iast (3.217 ms) : 3174, 3259
. : milestone, 3217,
iast_FULL (5.848 ms) : 5789, 5907
. : milestone, 5848,
iast_GLOBAL (3.691 ms) : 3630, 3752
. : milestone, 3691,
profiling (2.332 ms) : 2310, 2353
. : milestone, 2332,
tracing (1.812 ms) : 1797, 1827
. : milestone, 1812,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section baseline
no_agent (1.482 ms) : 1470, 1493
. : milestone, 1482,
appsec (4.274 ms) : 4018, 4530
. : milestone, 4274,
iast (2.653 ms) : 2543, 2762
. : milestone, 2653,
iast_GLOBAL (2.706 ms) : 2596, 2816
. : milestone, 2706,
profiling (2.46 ms) : 2368, 2553
. : milestone, 2460,
tracing (2.406 ms) : 2318, 2494
. : milestone, 2406,
section candidate
no_agent (1.485 ms) : 1473, 1497
. : milestone, 1485,
appsec (4.188 ms) : 3934, 4442
. : milestone, 4188,
iast (2.658 ms) : 2548, 2768
. : milestone, 2658,
iast_GLOBAL (2.71 ms) : 2600, 2821
. : milestone, 2710,
profiling (2.434 ms) : 2344, 2524
. : milestone, 2434,
tracing (2.42 ms) : 2331, 2509
. : milestone, 2420,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~80b454c348, baseline=1.60.0-SNAPSHOT~c6896b7cf7
dateFormat X
axisFormat %s
section baseline
no_agent (14.499 s) : 14499000, 14499000
. : milestone, 14499000,
appsec (14.336 s) : 14336000, 14336000
. : milestone, 14336000,
iast (17.162 s) : 17162000, 17162000
. : milestone, 17162000,
iast_GLOBAL (17.085 s) : 17085000, 17085000
. : milestone, 17085000,
profiling (14.214 s) : 14214000, 14214000
. : milestone, 14214000,
tracing (14.23 s) : 14230000, 14230000
. : milestone, 14230000,
section candidate
no_agent (15.08 s) : 15080000, 15080000
. : milestone, 15080000,
appsec (14.228 s) : 14228000, 14228000
. : milestone, 14228000,
iast (17.366 s) : 17366000, 17366000
. : milestone, 17366000,
iast_GLOBAL (17.148 s) : 17148000, 17148000
. : milestone, 17148000,
profiling (14.412 s) : 14412000, 14412000
. : milestone, 14412000,
tracing (14.298 s) : 14298000, 14298000
. : milestone, 14298000,
|
These benchmarks exist to show why the APIs are forbidden
| * StringReplacementBenchmark.string_replaceAll thrpt 6 14611046.391 ± 4865682.875 ops/s | ||
| * StringReplacementBenchmark.string_replaceAll:gc.alloc.rate thrpt 6 11391.346 ± 3790.917 MB/sec | ||
| * |
There was a problem hiding this comment.
Can you add also String.replace(CharSequence, CharSequence) for completeness?
There was a problem hiding this comment.
Good catch. I'd overlooked that replace actually does a replace all, so I didn't realize it was equivalent.
There was a problem hiding this comment.
To my surprise, the Strings.replaceAll that I implemented is actually still 20% faster than String.replace.
…race-java into dougqh/strings-improvements
|
This pull request has been marked as stale because it has not had activity over the past quarter. It will be closed in 7 days if no further activity occurs. Feel free to reopen the PR if you are still working on it. |
bric3
left a comment
There was a problem hiding this comment.
Nice tweak. However, I think some tiny javdoc improvement would be worth it.
Approving nonetheless.
| * A <code>CharSequence</code> that is view into a sub-sequencce of a <code>String</code> Unlike | ||
| * <code>String.subSequence</code>, this class doesn't allocate an additional <code>String</code>, | ||
| * <code>char[]</code>, or <code>byte[]</code> | ||
| */ |
There was a problem hiding this comment.
suggestion: I believe this worth adding a motivation over why not creating a new String is important vs allocating a SubSequence anyway, and when the benefit "disappears" e.g with suSeq.hashCode(), etc.
String.subSequence(start, end), creates a String object, that has a backing array object, and copies the characters over there (that said array copy should be pretty fast). Not sure about that, but I think it's possible that SubSequence can benefit from scalr replacement in loops, if that's the case that's something worthy to put in the javadoc.
IIRC String.subSequence(start, end) creates a new String instance so that the bigger original String had more chance to get GCed, while in this case the SubSequence keeps the original String, so that might be something to take into account.
…tion) Per bric3 review: the doc said what SubSequence avoids allocating but not why it matters. Explain the use case — allocation-free lightweight parsing: substring/ subSequence copy per call, so splitting a string into many pieces on a hot path allocates O(pieces) Strings; SubSequence is a zero-copy view. Also fixes a typo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🟢 Java Benchmark SLOs — All performance SLOs passed
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
79b5a30 to
db6ebcd
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
What Does This Do
Adds lighter String-processing helpers to
Strings:SubSequence— a zero-copyCharSequenceview into aString(no extraString/char[]/byte[]). The differentiated win: lets parsing slice a string into many pieces without allocating aStringper slice.Strings.split→Iterable<SubSequence>— allocation-free, regex-free lightweight parsing built onSubSequence.Strings.replaceAll— fast literal (non-regex) replaceAll: ~3× the regex methods (String.replaceAll/Matcher.replaceAll). Honest caveat: the JDK's ownString.replace(CharSequence, CharSequence)is also literal/regex-free and already captures most of that —Strings.replaceAllis only ~1.2× over it (plus a clearer name). So its incremental value overString.replaceis modest. SeeStringReplaceAllBenchmark(which includes astring_replacerow).Motivation
Reduce allocation from String processing on hot paths. The original target was
QueryObfuscator, but that proved less compelling than hoped: obfuscation is inherently regex/pattern work (literalStrings.replaceAlldoesn't fit it), and the regex path is better served by reusing a compiledPattern(see #10630). The more promising use for these helpers is allocation-free parsing — propagation headers / PTags, tag lists, SQLCommenter, query strings — where per-piecesubstringallocation adds up. The value is realized once a real hot path adoptsSubSequence/split.Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.