Skip to content

Commit 16a5297

Browse files
dougqhsarahchen6devflow.devflow-routing-intake
authored
List iteration benchmark (#10888)
List iteration benchmark Merge branch 'master' into dougqh/list-iteration-benchmark Fixing silly oversight - commented out benchmarks to run one in isolation earlier A bit of clean-up Adding missing end ul to doc Moving iterator benchmarks next to enhancedFor Merge branch 'master' into dougqh/list-iteration-benchmark spotless Merge branch 'dougqh/list-iteration-benchmark' of github.com:DataDog/dd-trace-java into dougqh/list-iteration-benchmark Update internal-api/src/jmh/java/datadog/trace/util/ListIterationBenchmark.java Co-authored-by: Sarah Chen <sarah.chen@datadoghq.com> Update internal-api/src/jmh/java/datadog/trace/util/ListIterationBenchmark.java Co-authored-by: Sarah Chen <sarah.chen@datadoghq.com> Merge branch 'master' into dougqh/list-iteration-benchmark Update internal-api/src/jmh/java/datadog/trace/util/ListIterationBenchmark.java Co-authored-by: Sarah Chen <sarah.chen@datadoghq.com> Isolate per-thread collections in ListIterationBenchmark Build each thread's list (and its Elements) in a Scope.Thread @setup so the manipulate_* mutations stay thread-local. Previously the lists lived in enum constants shared across all 8 threads, so the benchmark measured cross-thread contention on Element.num rather than iteration cost. Also bump to @fork(2) and fix a Javadoc typo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Merge remote-tracking branch 'origin/master' into dougqh/list-iteration-benchmark Replace stale results in ListIterationBenchmark with Java 17 numbers Drop the old (pre-per-thread-state) results table; add a condensed Java 17 block. For ArrayList the direct styles (cstyleFor/forEach/enhanced-for/iterator) cluster within ~10%; stream() is ~3.6x slower; parallelStream() is catastrophic for small lists (ForkJoinPool overhead) and erratic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Merge branch 'master' into dougqh/list-iteration-benchmark Co-authored-by: sarahchen6 <sarah.chen@datadoghq.com> Co-authored-by: devflow.devflow-routing-intake <devflow.devflow-routing-intake@kubernetes.us1.ddbuild.io>
1 parent 42ae556 commit 16a5297

1 file changed

Lines changed: 204 additions & 0 deletions

File tree

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
package datadog.trace.util;
2+
3+
import java.util.ArrayList;
4+
import java.util.Collections;
5+
import java.util.Iterator;
6+
import java.util.List;
7+
import java.util.function.Supplier;
8+
import org.openjdk.jmh.annotations.Benchmark;
9+
import org.openjdk.jmh.annotations.CompilerControl;
10+
import org.openjdk.jmh.annotations.CompilerControl.Mode;
11+
import org.openjdk.jmh.annotations.Fork;
12+
import org.openjdk.jmh.annotations.Level;
13+
import org.openjdk.jmh.annotations.Measurement;
14+
import org.openjdk.jmh.annotations.Param;
15+
import org.openjdk.jmh.annotations.Scope;
16+
import org.openjdk.jmh.annotations.Setup;
17+
import org.openjdk.jmh.annotations.State;
18+
import org.openjdk.jmh.annotations.Threads;
19+
import org.openjdk.jmh.annotations.Warmup;
20+
21+
/**
22+
* Benchmark comparing different ways to iterate list of different types and sizes -- both with
23+
* simple loop bodies (inline case) and complicated loop bodies (dont inline case).
24+
*
25+
* <ul>
26+
* Compares...
27+
* <li>(RECOMMENDED) enhanced for loop / iterator - usually most performant, since escape analysis
28+
* usually eliminates iterator allocation
29+
* <li>(SITUATIONAL) List.forEach - good when using a non-capturing lambda, when escape analysis
30+
* fails to eliminate iterator allocation - good alternative
31+
* <li>(SITUATIONAL) c-style i=0; i < list.size() - usually worse than enhanced for - might be
32+
* useful with complicated loop body when escape analysis fails to eliminate the iterator
33+
* <li>(DISCOURAGED) List.stream - always incurs allocation overhead - usually unnecessary
34+
* <li>(DISCOURAGED) List.parallelStream - heavy allocation overhead - only beneficial when
35+
* working with sets (uncommon in the java agent)
36+
* <li>
37+
* </ul>
38+
*
39+
* <p>Java 17 results (Apple M1, {@code @Fork(2)}, {@code @Threads(8)}; {@code ArrayList}, M ops/s =
40+
* millions, shown at two sizes): <code>
41+
* Iteration style size 10 size 100
42+
* cstyleFor 1050 165 (fastest)
43+
* forEach 995 163
44+
* enhancedFor 945 153
45+
* iterator 935 148 (noisier run-to-run)
46+
* streams 158 45 (~3.6x slower; allocates)
47+
* parallelStreams ~1 ~0.3 (catastrophic at these sizes)
48+
* </code>
49+
*
50+
* <p>Key findings:
51+
*
52+
* <ul>
53+
* <li>For {@code ArrayList}, the direct styles -- {@code cstyleFor}, {@code forEach},
54+
* enhanced-for, and explicit {@code iterator} -- cluster within ~10% of each other; escape
55+
* analysis eliminates the iterator allocation, so enhanced-for/iterator stay competitive
56+
* while reading cleanest (the RECOMMENDED choice).
57+
* <li>{@code stream()} is ~3.6x slower than direct iteration and allocates per call -- avoid on
58+
* hot paths.
59+
* <li>{@code parallelStream()} is catastrophic for small collections (hundreds of times slower):
60+
* ForkJoinPool split/coordinate overhead dwarfs the work, and it is run-to-run erratic. Never
61+
* use it for the small lists typical in the agent.
62+
* <li>{@code _inline} vs {@code _dont_inline} loop bodies barely differ at these sizes -- the
63+
* iteration mechanics dominate, not the body.
64+
* </ul>
65+
*/
66+
@Fork(2)
67+
@Warmup(iterations = 2)
68+
@Measurement(iterations = 3)
69+
@Threads(8)
70+
@State(Scope.Thread)
71+
public class ListIterationBenchmark {
72+
public static final class Element {
73+
int num = 0;
74+
75+
@CompilerControl(Mode.INLINE)
76+
void manipulate_inline() {
77+
this.num += 1;
78+
}
79+
80+
@CompilerControl(Mode.DONT_INLINE)
81+
void manipulate_dont_inline() {
82+
this.num += 1;
83+
}
84+
}
85+
86+
static ArrayList<Element> newArrayList(int size) {
87+
ArrayList<Element> newList = new ArrayList<>(size);
88+
for (int i = 0; i < size; ++i) {
89+
newList.add(new Element());
90+
}
91+
return newList;
92+
}
93+
94+
/**
95+
* Describes the list under test as a factory rather than a prebuilt instance. Each benchmark
96+
* thread builds its own list (with its own {@link Element}s) in {@link #setUp()}, so the {@code
97+
* manipulate_*} mutations stay thread-local — otherwise, with {@code @Threads(8)} sharing one
98+
* list held in an enum constant, the benchmark would measure cross-thread contention on {@code
99+
* Element.num} rather than iteration cost.
100+
*/
101+
public enum ListSpec {
102+
COLLECTIONS_EMPTY_LIST(Collections::emptyList),
103+
EMPTY_ARRAY_LIST(ArrayList::new),
104+
SINGLETON_LIST(() -> Collections.singletonList(new Element())),
105+
ARRAY_LIST_1(() -> newArrayList(1)),
106+
ARRAY_LIST_5(() -> newArrayList(5)),
107+
ARRAY_LIST_10(() -> newArrayList(10)),
108+
ARRAY_LIST_100(() -> newArrayList(100));
109+
110+
private final Supplier<List<Element>> factory;
111+
112+
ListSpec(Supplier<List<Element>> factory) {
113+
this.factory = factory;
114+
}
115+
116+
List<Element> build() {
117+
return factory.get();
118+
}
119+
}
120+
121+
@Param ListSpec listSpec;
122+
123+
List<Element> list;
124+
125+
@Setup(Level.Trial)
126+
public void setUp() {
127+
// Built per thread (the class is @State(Scope.Thread)) so each thread owns its own Elements.
128+
this.list = this.listSpec.build();
129+
}
130+
131+
@Benchmark
132+
public void forEach_inline() {
133+
this.list.forEach(Element::manipulate_inline);
134+
}
135+
136+
@Benchmark
137+
public void forEach_dont_inline() {
138+
this.list.forEach(Element::manipulate_dont_inline);
139+
}
140+
141+
@Benchmark
142+
public void enhancedFor_inline() {
143+
// Enhanced for-loop is just syntax sugar for an Iterator
144+
for (Element e : this.list) {
145+
e.manipulate_inline();
146+
}
147+
}
148+
149+
@Benchmark
150+
public void enhancedFor_dont_inline() {
151+
// Enhanced for-loop is just syntax sugar for an Iterator
152+
for (Element e : this.list) {
153+
e.manipulate_dont_inline();
154+
}
155+
}
156+
157+
@Benchmark
158+
public void iterator_inline() {
159+
for (Iterator<Element> iter = this.list.iterator(); iter.hasNext(); ) {
160+
iter.next().manipulate_inline();
161+
}
162+
}
163+
164+
@Benchmark
165+
public void iterator_dont_inline() {
166+
for (Iterator<Element> iter = this.list.iterator(); iter.hasNext(); ) {
167+
iter.next().manipulate_dont_inline();
168+
}
169+
}
170+
171+
@Benchmark
172+
public void cstyleFor_inline() {
173+
for (int i = 0; i < this.list.size(); ++i) {
174+
this.list.get(i).manipulate_inline();
175+
}
176+
}
177+
178+
@Benchmark
179+
public void cstyleFor_dont_inline() {
180+
for (int i = 0; i < this.list.size(); ++i) {
181+
this.list.get(i).manipulate_dont_inline();
182+
}
183+
}
184+
185+
@Benchmark
186+
public void streams_inline() {
187+
this.list.stream().forEach(Element::manipulate_inline);
188+
}
189+
190+
@Benchmark
191+
public void streams_dont_inline() {
192+
this.list.stream().forEach(Element::manipulate_dont_inline);
193+
}
194+
195+
@Benchmark
196+
public void parallelStreams_inline() {
197+
this.list.parallelStream().forEach(Element::manipulate_inline);
198+
}
199+
200+
@Benchmark
201+
public void parallelStreams_dont_inline() {
202+
this.list.parallelStream().forEach(Element::manipulate_dont_inline);
203+
}
204+
}

0 commit comments

Comments
 (0)