This document describes how to benchmark Redpanda Connect connectors — the standard approach, the tools involved, and how to record and report results.
Each connector that needs benchmarking gets a self-contained bench/ directory inside its implementation package (e.g. internal/impl/<component>/bench/). The benchmark suite should be fully reproducible from a single task invocation and should measure throughput of the connector under realistic conditions.
The general approach:
- Stand up the external dependency (database, message broker, etc.) in Docker
- Generate a realistic dataset
- Run Redpanda Connect with the connector configured, using the built-in
benchmarkprocessor to measure throughput - Record results (msg/sec, MB/sec) in the PR description
Place benchmarking files in internal/impl/<component>/bench/:
internal/impl/<component>/bench/
├── README.md # How to run, prerequisites, expected output
├── Taskfile.yaml # Task runner for orchestration
├── benchmark_config.yaml # Redpanda Connect pipeline config
├── docker-compose.yml # (optional) Multi-service setups
├── create.sql # (optional) Schema creation scripts
├── users.sql # (optional) Data generation scripts
└── main.go # (optional) Programmatic data seeding
Use Docker to run the service locally. Define tasks in Taskfile.yaml for starting, stopping, and managing the container. Use the same image that production would use — avoid "lite" or "local" variants unless that's the only option (e.g. DynamoDB Local), and document the limitation.
version: '3'
tasks:
service:up:
cmd: |
docker run -d \
--name <service-name> \
-p <host-port>:<container-port> \
-e <ENV_VARS> \
<image>
service:down:
cmd: docker rm -fv <service-name>
service:logs:
cmd: docker logs -f <service-name>For benchmarks involving multiple services (e.g. source and destination clusters), use a docker-compose.yml instead.
Reproducibility controls — For consistent results across runs, pin resources in your docker-compose:
- CPU pinning (
cpuset) — Prevents OS scheduling noise. Assign dedicated cores to each container so they don't compete. - Memory limits (
mem_limit) — Prevents the OOM killer and keeps conditions consistent. - Go runtime tuning — Set
GOMAXPROCSandGOMEMLIMITon Connect containers to control goroutine scheduling and GC pressure.
See the migrator benchmark for an example that pins source, destination, loader, and migrator to separate CPU sets:
migrator:
environment:
GOMAXPROCS: "3"
GOMEMLIMIT: "3GiB"
cpuset: "5,6,7"
mem_limit: 3500MDataset design — Use multiple tables with different schemas (e.g. users, products, orders) rather than one giant table. This is more realistic and matters for CDC connectors where per-table parallelism is a factor. Use realistic row sizes (1-2KB is typical).
There are three approaches depending on the connector:
SQL scripts — For database connectors, write SQL scripts that generate bulk data. Use stored procedures with loops for large datasets:
-- Example: generate 500,000 rows
DECLARE @i INT = 0;
WHILE @i < 500000
BEGIN
INSERT INTO users (name, email, created_at)
VALUES (CONCAT('user-', @i), CONCAT('user', @i, '@example.com'), GETDATE());
SET @i = @i + 1;
ENDAdd Taskfile entries for each data generation script:
data:users:
cmd: task sqlcmd EXTRA_ARGS="-i users.sql"Go seeder program — For services with native Go SDKs (e.g. DynamoDB), write a main.go that seeds data using concurrent workers. Use BatchWriteItem or equivalent bulk APIs for speed. See the DynamoDB benchmark for a reference implementation using 16 concurrent workers to insert 450k items.
Bloblang generate input — For benchmarks that just need raw message throughput (e.g. migrator benchmarks), use a Redpanda Connect config with generate input:
input:
generate:
interval: "" # As fast as possible
count: 30_000_000
batch_size: 1_000
mapping: |
root = "<your payload here>"Create benchmark_config.yaml — a Redpanda Connect config that reads from the connector under test and sinks to drop: {} (discard output). The key element is the benchmark processor which logs rolling throughput statistics:
http:
debug_endpoints: true # Required for profiling
input:
<your_connector>:
# connector-specific config
batching:
count: 1000 # Tune batch size for throughput
output:
processors:
- benchmark:
interval: 1s # How often to log stats
count_bytes: true # Report MB/sec in addition to msg/sec
drop: {} # Discard output — we only care about read throughput
logger:
level: INFO
metrics:
prometheus:
add_process_metrics: true
add_go_metrics: trueKey configuration points:
http.debug_endpoints: true— Exposes pprof endpoints atlocalhost:4195for CPU/memory/blocking profilingbenchmarkprocessor — Logsmsg/secandbytes/secat the configured intervaldrop: {}— Eliminates output overhead so you measure only input throughput- Prometheus metrics — Enables process and Go runtime metrics for monitoring via Grafana
Batch size tuning — The batching.count parameter has a significant impact on throughput and varies widely across connectors. Existing benchmarks range from 1,000 (SQL Server, DynamoDB) to 140,000 (Oracle CDC). Experiment with this value — too small means excessive per-batch overhead, too large means memory pressure and latency spikes. Document what you tested and what worked best.
Docker image architecture — On Apple Silicon (ARM), make sure you're using the correct image architecture. The migrator benchmark explicitly uses redpandadata/connect:edge-arm64. Running an x86 image under Rosetta/QEMU emulation will tank throughput numbers and produce misleading results.
Wire everything together in the Taskfile so task (or task run) executes the full sequence:
run:
cmds:
- task: service:up
- task: create
- task: seed
- go run ../../../../../cmd/redpanda-connect/main.go run ./benchmark_config.yamlOr for manual step-by-step execution:
# Start the service
task service:up
# Create schema and seed data
task create
task seed
# Run the benchmark
go run ../../../../cmd/redpanda-connect/main.go run ./benchmark_config.yamlYou should see rolling throughput logs:
INFO rolling stats: 101000 msg/sec, 135 MB/sec @service=redpanda-connect ...
INFO rolling stats: 104000 msg/sec, 139 MB/sec @service=redpanda-connect ...
Most CDC connectors maintain a checkpoint/cursor. Add a task to clear it between runs:
drop-checkpoint:
cmd: <command to drop checkpoint table/cache>Every bench/ directory must have a README.md that includes:
- Prerequisites — tools to install (e.g.
brew install sqlcmd, Docker, etc.) - How to Run — step-by-step commands
- Expected Output — sample throughput logs so reviewers know what "good" looks like
- Notes — any caveats (e.g. single-shard limitations, container resource constraints, data retention windows)
For CDC connectors, benchmark both modes separately — they have very different performance characteristics:
- Snapshot mode — Reads the full current state of tables. Benefits from greater read concurrency and typically achieves higher throughput. Behaves similarly across SQL-based connectors since it's essentially a bulk
SELECT. Oracle CDC snapshot hit ~140K msg/sec vs ~50K for streaming. - Streaming mode — Reads change events from a log (CDC tables, LogMiner, DynamoDB Streams, etc.). Often single-threaded per table and constrained by the source system's change capture mechanism. This is the mode that matters most for production workloads.
Report both numbers. Snapshot throughput establishes a ceiling; streaming throughput is what customers will actually experience.
Some source systems have retention windows for change data:
- DynamoDB Streams — 24 hour retention. Insert data and run the benchmark promptly.
- Oracle LogMiner — SCN windows and redo log retention. Configure RMAN archive log policies appropriately (see
rman_setup.rmanin the Oracle benchmark). - SQL Server CDC — Cleanup jobs may purge change tables. Disable or extend the retention period for benchmarking.
Document any retention-related constraints in the benchmark README so others don't waste time debugging "0 msg/sec" output.
For deeper investigation, use the profiling tools in resources/docker/profiling/:
# Start Prometheus + Grafana monitoring stack
cd resources/docker/profiling
task up
# Grafana: http://localhost:3000
# Prometheus: http://localhost:9090
# Capture profiles (requires debug_endpoints: true in your config)
task profile:cpu # 30s CPU profile
task profile:mem # Memory heap profile
task profile:block # Goroutine blocking profile
# View profiles in browser
task pprof:cpu
task pprof:mem
task pprof:blockFor long-running profiling sessions, consider a streaming data generator that produces continuous load (see the migrator's loader-streaming.yaml which generates ~100MB/s indefinitely).
Record benchmark results in the PR description. Include:
- Runtime environment — laptop/VM specs, OS, Docker resource limits
- Dataset — row count, approximate size (e.g. "1.4KB × 21M rows = 24GB")
- Throughput — rolling stats output showing msg/sec and MB/sec
- Profiling artifacts — screenshots of Grafana dashboards, Go runtime metrics, memory profiles
- Observations — bottleneck analysis, what was tried to improve performance, comparison with other tools if relevant
Good benchmark PRs don't just report numbers — they investigate where the bottleneck is. Techniques used in past benchmarks:
- Bypass the connector — Use
sql_rawinput or a simpler input to rule out the connector's own code as the bottleneck vs the source system (done in the SQL Server CDC benchmark). - Check connection utilization — Log
sql.DBStatsto see how many connections are actually in use. The SQL Server benchmark revealed only 1 of 100 connections was active, proving the bottleneck was single-threaded reads, not Connect. - Vary the environment — Test against local Docker, native installs, and cloud-hosted instances (e.g. Azure SQL Premium) to isolate whether containerization overhead matters.
- Parallelize across tables — If the source is single-connection-bound per table, test with multiple tables to see if throughput scales linearly with connections.
- Compare with competitors — A quick run with Debezium or an equivalent tool establishes whether throughput limits are inherent to the protocol (e.g. Oracle LogMiner) or specific to Connect's implementation.
For post-hoc analysis, you can write benchmark output to a file instead of (or in addition to) dropping it:
output:
processors:
- benchmark:
interval: 1s
count_bytes: true
file:
path: "./results.json"
codec: linesIn addition to the PR description, add or update a results file in docs/benchmark-results/. Each connector gets its own file (e.g. mssqlserver-cdc.md). Append new runs as dated sections so we can track performance over time.
When adding a new result, include:
- Date and PR link
- Environment details (hardware, Docker config, resource limits)
- Dataset description (row count, row size, total size)
- Configuration highlights (batch size, parallelism, tuning parameters)
- Throughput table and raw log output
- Observations and bottleneck analysis
Example from the SQL Server CDC benchmark PR:
Runtime: ~4m 30s Dataset: 1.4kb × 21,198,489 rows = 24.1GB
INFO rolling stats: 101000 msg/sec, 135 MB/sec INFO rolling stats: 104000 msg/sec, 139 MB/sec INFO rolling stats: 103000 msg/sec, 138 MB/sec
For a non-technical overview suitable for sales, marketing, and other non-engineering audiences, see the Performance Summary.
| Component | Bench Suite | Results | Throughput | Notes |
|---|---|---|---|---|
| Redpanda Migrator | internal/impl/redpanda/migrator/bench/ |
results | 1 GB/s+, 1M msg/sec | Cluster-to-cluster, 30GB transfer |
| SQL Server CDC | internal/impl/mssqlserver/bench/ |
results | ~135 MB/sec, 100K msg/sec | Single connection bottleneck |
| Oracle CDC | internal/impl/oracledb/bench/ |
results | ~50K msg/sec (streaming) | LogMiner single-threaded limitation |
| DynamoDB CDC | internal/impl/aws/dynamodb/bench/ |
results | ~200 MB/sec, 100K msg/sec | DynamoDB Local, 3 tables x 150K items |
Benchmark results go stale. Follow these practices to keep them current:
-
When adding a new benchmark suite — Create a corresponding results file in
docs/benchmark-results/, update the table in this document, and updatedocs/benchmark-results/SUMMARY.md. -
When modifying a connector's performance path — Re-run the benchmark and append a new dated section to the results file. This includes changes to batching, buffering, connection handling, serialization, or any code that sits in the hot path.
-
When re-running an existing benchmark — Always append (don't replace) so we can track performance over time. Include the date, PR link, and what changed since the last run.
-
During code review — The
/reviewskill includes a benchmarking check. It will flag PRs that add or modifybench/directories without updating results files, and PRs that include throughput numbers in the description without recording them indocs/benchmark-results/. It will also note when performance-critical connector changes may warrant a benchmark re-run.
For unit-level benchmarks of internal components (serialization, conversion, etc.), use standard Go testing.B benchmarks in *_test.go files. Use b.ReportMetric() to report domain-specific metrics (e.g. spans/sec) and b.ReportAllocs() for allocation tracking:
func BenchmarkConvert(b *testing.B) {
// setup...
b.ReportAllocs()
for b.Loop() {
// operation under test
}
b.ReportMetric(float64(itemCount)/b.Elapsed().Seconds(), "items/sec")
}These are complementary to the integration-level benchmarks described above and are useful for isolating performance of specific code paths.