Skip to content

Commit 343cd94

Browse files
harp-intelCopilot
andauthored
Add tool usage scenarios (#13)
* Add tool usage scenarios Signed-off-by: Harper, Jason M <jason.m.harper@intel.com> * Add tool usage scenarios Signed-off-by: Harper, Jason M <jason.m.harper@intel.com> * addressing review comments and adjusting some scenarios Signed-off-by: Harper, Jason M <jason.m.harper@intel.com> * fix eBPF header Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add expertise badges and a tools summary including capabilities on cloud Signed-off-by: Harper, Jason M <jason.m.harper@intel.com> * formatting Signed-off-by: Harper, Jason M <jason.m.harper@intel.com> --------- Signed-off-by: Harper, Jason M <jason.m.harper@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent f9c4572 commit 343cd94

1 file changed

Lines changed: 231 additions & 10 deletions

File tree

tools/README.md

Lines changed: 231 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,240 @@
22

33
This directory contains documentation for performance monitoring and profiling tools used in optimization work.
44

5-
## Tool Summaries
5+
## Contents
66

7-
**Intel® gProfiler** - A system-wide profiler that combines multiple sampling profilers to visualize CPU usage across native programs, Java, Python runtimes, and kernel routines. It also includes Intel® gProfiler Performance Studio, a self-hosted solution for aggregating results from multiple instances.
7+
- [Intel® Tools Reference](#intel-tools-reference)
8+
- [Intel® PerfSpect](#intel-perfspect)
9+
- [Intel® VTune™ Profiler](#intel-vtune-profiler)
10+
- [Intel® Performance Counter Monitor (PCM)](#intel-performance-counter-monitor-pcm)
11+
- [Intel® gProfiler](#intel-gprofiler)
12+
- [Other Tools Reference](#other-tools-reference)
13+
- [Linux `perf`](#linux-perf)
14+
- [Linux eBPF](#linux-ebpf-extended-berkeley-packet-filter)
15+
- [Environment Considerations: Baremetal vs. Cloud](#environment-considerations-baremetal-vs-cloud)
16+
- [Choosing the Right Tool](#choosing-the-right-tool)
817

9-
**Intel® Performance Counter Monitor (PCM)** - An API and toolset for monitoring performance and energy metrics of Intel processors. Provides real-time monitoring of key metrics including memory bandwidth, cache miss latencies, PCIe bandwidth, and energy states. Available on Linux, Windows, macOS, FreeBSD, and ChromeOS.
18+
## Intel® Tools Reference
1019

11-
**Intel® PerfSpect** - A comprehensive performance engineering toolkit that monitors CPU metrics, reports system configuration and health, collects system telemetry, generates flamegraphs from call-stacks, and modifies performance-related configuration settings.
20+
### Intel® [PerfSpect](perfspect/README.md)
1221

13-
**Intel® VTune™ Profiler** - An optimization tool for application and system performance analysis across AI, HPC, cloud, IoT, and storage workloads. Capabilities include identifying microarchitecture and memory bottlenecks, optimizing accelerators, analyzing parallelism, and multi-node analysis.
22+
**Easy to install and use.** Comprehensive performance engineering toolkit for system health reporting, configuration analysis, architectural metrics, flamegraph generation, telemetry collection, and tuning parameter modification. Provides quick insights across multiple dimensions without the learning curve or deep complexity of other tools.
1423

15-
## Individual Tool Documentation
24+
🎯 **Expertise Level:** 🟢 Beginner - 🟡 Intermediate
1625

17-
- [gProfiler](gprofiler/README.md)
18-
- [PCM](pcm/README.md)
19-
- [PerfSpect](perfspect/README.md)
20-
- [VTune](vtune/README.md)
26+
📊 **Best for:** System assessment, configuration validation, quick troubleshooting, health checks, getting started with performance analysis
27+
28+
**Key advantage:** Accessibility and speed of use, though with less depth than specialized tools
29+
30+
### Intel® [VTune™ Profiler](vtune/README.md)
31+
32+
In-depth application and system profiler with microarchitecture analysis, parallelism examination, multi-node analysis, and GPU/accelerator optimization capabilities.
33+
34+
🎯 **Expertise Level:** 🔴 Advanced — requires expertise in microarchitecture concepts and profiling methodology
35+
36+
📊 **Best for:** Deep application optimization, microarchitecture analysis, GPU optimization, HPC workloads, complex debugging
37+
38+
### Intel® [Performance Counter Monitor (PCM)](pcm/README.md)
39+
40+
API and toolset for monitoring performance and energy metrics of Intel processors including memory bandwidth, cache behavior, PCIe bandwidth, and energy states.
41+
42+
🎯 **Expertise Level:** 🟡 Intermediate - 🔴 Advanced — requires understanding of hardware performance counters
43+
44+
📊 **Best for:** Hardware-level metrics, memory analysis, power consumption, real-time dashboards
45+
46+
### Intel® [gProfiler](gprofiler/README.md)
47+
48+
System-wide profiler combining multiple sampling profilers across native programs, Java, Python runtimes, and kernel routines. Includes optional gProfiler Performance Studio for cluster-wide aggregation.
49+
50+
🎯 **Expertise Level:** 🟡 Intermediate (single node) - 🔴 Advanced (multi-node cluster)
51+
52+
📊 **Best for:** Production monitoring, multi-language environments, cluster analysis, low-overhead continuous profiling
53+
54+
## Other Tools Reference
55+
56+
### Linux `perf`
57+
58+
Powerful performance analysis tool for Linux systems, providing a wide range of profiling capabilities including CPU performance counters, tracepoints, and dynamic probes.
59+
60+
🎯 **Expertise Level:** 🔴 Advanced — requires familiarity with Linux internals and performance events
61+
62+
### Linux eBPF (extended Berkeley Packet Filter)
63+
64+
A powerful technology for tracing and monitoring kernel and user-space events with minimal overhead, allowing for custom performance analysis and observability.
65+
66+
🎯 **Expertise Level:** 🔴 Advanced — requires knowledge of kernel tracing, BPF programs, and Linux internals
67+
68+
## Environment Considerations: Baremetal vs. Cloud
69+
70+
Not all tools work equally well in every environment. The key factor is access to hardware configuration settings and hardware events, e.g., **PMU (Performance Monitoring Unit) counter access**, which varies significantly between baremetal and cloud deployments.
71+
72+
### Baremetal
73+
74+
Full PMU counter access is typically available, so all tools can operate at their full potential. This is the ideal environment for deep hardware-level analysis with PerfSpect, VTune, PCM, and perf.
75+
76+
### Cloud
77+
78+
Cloud vendors vary in PMU counter availability. Many instance types restrict or disable access to hardware performance counters, which limits the effectiveness of tools that depend on them.
79+
80+
- **Recommended starting point:** Use **PerfSpect** for quick performance insights — it does not depend on full PMU access and works reliably across cloud environments.
81+
- **VTune** depends on PMU counters. Check your cloud vendor's documentation for PMU support:
82+
- Some vendors offer dedicated/metal instance types with full PMU access.
83+
- Standard VM instances may have limited or no PMU counter availability.
84+
- **gProfiler** works well in cloud environments for software-level profiling and does not require PMU access.
85+
86+
### Quick Reference by Environment
87+
88+
| Tool | Baremetal | Cloud (standard VM) | Cloud (metal/dedicated) |
89+
| ---- | --------- | ------------------- | ----------------------- |
90+
| PerfSpect | ✅ Full support | ⚠️ Some Features Limited | ✅ Full support |
91+
| gProfiler | ✅ Full support | ✅ Full support | ✅ Full support |
92+
| VTune | ✅ Full support | ⚠️ Limited | ✅ Full support |
93+
| PCM | ✅ Full support | ⚠️ Some Features Limited | ✅ Full support |
94+
| Linux perf | ✅ Full support | ⚠️ Some Features Limited | ✅ Full support |
95+
| Linux eBPF | ✅ Full support | ✅ Supported (needs kernel support) | ✅ Full support |
96+
97+
## Choosing the Right Tool
98+
99+
Start with your primary goal or problem, then follow the decision path to find the best tool(s).
100+
101+
### Tool Summary
102+
103+
| Tool | Expertise Level | Baremetal | Cloud | Best Starting Point |
104+
| ---- | --------------- | --------- | ----- | ------------------- |
105+
| PerfSpect | 🟢 Beginner - 🟡 Intermediate | ✅ Yes | ⚠️ Limited | ✅ Yes |
106+
| gProfiler | 🟡 Intermediate - 🔴 Advanced | ✅ Yes | ✅ Yes | ✅ Sometimes |
107+
| VTune | 🔴 Advanced | ✅ Yes | ⚠️ Limited | No |
108+
| PCM | 🟡 Intermediate - 🔴 Advanced | ✅ Yes | ⚠️ Limited | ✅ Sometimes |
109+
| Linux perf | 🔴 Advanced | ✅ Yes | ⚠️ Limited | No |
110+
| Linux eBPF | 🔴 Advanced | ✅ Yes | ✅ Yes | No |
111+
112+
### START: What is your primary goal?
113+
114+
#### **"I need a quick system assessment" (Easy start)**
115+
116+
**Use: PerfSpect** ⭐ Easiest to install and use
117+
118+
- Validating system configuration before performance testing
119+
- Getting a health check and performance baseline
120+
- Quick automated system tuning recommendations
121+
- Pre-flight checks before running benchmarks
122+
- Understanding current system telemetry and state
123+
- **Start here if you're new to performance analysis** – no steep learning curve
124+
125+
---
126+
127+
#### **"My application/workload is slow - I need to find where time is spent"**
128+
129+
**→ Do you need to analyze multiple languages or continuous production monitoring?**
130+
131+
- **YES (multi-language or continuous monitoring)****Use: gProfiler**
132+
- Multi-language environments (native, Java, Python) requiring unified profiling
133+
- Finding performance bottlenecks in microservices architectures
134+
- Analyzing resource utilization across production systems with low overhead
135+
- Identifying hot functions and stack traces without code instrumentation
136+
- Compare performance patterns across multiple machines over time
137+
138+
- **NO (ad-hoc analysis)****Use: PerfSpect**
139+
- Flamegraphs for quick visualization of call stacks and hot paths
140+
- Simple setup for immediate insights during development
141+
- Quick identification of performance bottlenecks without deep configuration
142+
- System Telemetry collection for understanding overall system behavior during testing
143+
- Architectural metrics for understanding how the application interacts with hardware resources
144+
145+
---
146+
147+
#### **"I want to correlate application performance with hardware performance metrics"**
148+
149+
**→ Do you have application source code?**
150+
151+
- **YES (have source code)****Use: VTune**
152+
- Correlating application performance with microarchitecture metrics
153+
- Analyzing cache behavior and memory bandwidth in relation to code execution
154+
- Identifying specific code regions causing hardware bottlenecks
155+
- GPU/accelerator optimization and analysis
156+
157+
- **NO (no source code)****Use: PerfSpect**
158+
- System-wide performance analysis without needing source code
159+
- Architectural metrics to understand hardware interactions
160+
- Flamegraphs to visualize hot paths even without code instrumentation
161+
- System Telemetry for overall system health and performance insights
162+
163+
---
164+
165+
#### **"I'm analyzing/optimizing distributed systems at scale"**
166+
167+
**→ Do you need to aggregate data from multiple machines?**
168+
169+
- **YES****Use: gProfiler + gProfiler Performance Studio**
170+
- Cluster-wide performance analysis
171+
- Comparing performance patterns across multiple machines or time periods
172+
- Holistic view of what is happening on your entire cluster
173+
174+
- **NO (single machine analysis)****Use: gProfiler or VTune** (based on depth needed)
175+
176+
---
177+
178+
#### **"I'm experiencing memory or bandwidth issues"**
179+
180+
**→ Are you investigating processor-level metrics?**
181+
182+
- **YES****Use: PCM**
183+
- Analyzing memory bandwidth utilization and DRAM behavior
184+
- Identifying memory bandwidth bottlenecks in data-intensive workloads
185+
- Detecting inefficient cache usage patterns
186+
- Monitoring cache miss latencies and PCIe bandwidth
187+
- Detailed microarchitecture analysis (cache efficiency, memory stalls)
188+
- Real-time system performance dashboards
189+
190+
- **NO (need application-level insights)****Use: VTune**
191+
- Identify which parts of code are causing memory issues
192+
- Detailed cache miss analysis at the instruction level
193+
194+
---
195+
196+
#### **"My parallel/multi-threaded application doesn't scale"**
197+
198+
**Use: VTune**
199+
200+
- Analyzing multi-threaded parallelism and scalability issues
201+
- Debugging poor thread scaling in parallel applications
202+
- Examining how effectively threads are utilized
203+
204+
---
205+
206+
#### **"I need to optimize GPU or accelerators"**
207+
208+
**Use: VTune**
209+
210+
- GPU/accelerator optimization and analysis
211+
- Analyzing GPU utilization and accelerator integration
212+
- Multi-node cluster performance analysis for HPC applications
213+
- AI/ML workload optimization and profiling
214+
215+
---
216+
217+
#### **"I need to monitor power consumption or energy efficiency"**
218+
219+
**Use: PCM**
220+
221+
- Tracking energy consumption and CPU sleep states
222+
- Power consumption analysis for cloud deployments
223+
- Integration with monitoring systems like Prometheus for continuous tracking
224+
225+
---
226+
227+
#### **"I need to visualize call stacks and hot code paths"**
228+
229+
**→ Do you want quick, shallow analysis or deep investigation?**
230+
231+
- **Quick and easy****Use: PerfSpect**
232+
- Generating flamegraphs for visualization of call stacks
233+
- Quick visualization of application hot paths
234+
- Simple setup and immediate insights
235+
236+
- **Production-scale or deep analysis****Use: gProfiler**
237+
- System-wide flamegraphs across all processes
238+
- Continuous profiling with minimal overhead
239+
- More sophisticated analysis capabilities
240+
241+
---

0 commit comments

Comments
 (0)