Conversation
Bench's textfile-collector output carried only `concurrency` as a
label, so a Prometheus alert grouping by series couldn't tell a
genuine throughput regression apart from a model swap. The
fingerprint *was* recorded by the bench (--auto-fingerprint
already discovered + printed it to stderr) but never made it to
the prom labels.
Now every metric carries `concurrency="N",fingerprint="<hex>"`.
Empty fingerprint (--allow-empty-fingerprint) renders as
`fingerprint=""` rather than getting dropped, so the label set
stays scrape-stable whether or not enforcement is on.
Example output (iter 256, cognitum-v0):
ruvector_hailo_bench_throughput_per_second{concurrency="2",fingerprint="9c56e5965aea9afd99ad51826805f1be01bb0ea3301aafb74982e29e3b9cf3fa"} 70.712
Now `rate(ruvector_hailo_bench_throughput_per_second[1h]) by (fingerprint)`
gives one series per model — a 9c56...-deploy throughput drop is a
real regression, while a fingerprint change is a deploy event the
operator already knew about.
# What ships
- BenchSummary gains a `fingerprint: String` field, populated from
the resolved fingerprint (whatever --fingerprint or
--auto-fingerprint produced).
- write_prom_textfile renders it on every metric.
- bench_cli_prom_file_contains_throughput_metric updated to lock
the new label format so a future regression surfaces in CI.
Local verification:
cargo test -p ruvector-hailo-cluster --test bench_cli (6 passed)
cargo clippy --all-targets -- -D warnings (clean)
Co-Authored-By: claude-flow <ruv@ruv.net>
…er 257) Surface the resolved RUVECTOR_NPU_POOL_SIZE through the gRPC StatsResponse so cluster-side observability can differentiate single-pipeline vs pool=N measurements. # Proto change (backward-compatible) StatsResponse gains `uint32 npu_pool_size = 10`. Old workers send 0 (proto3 default), which clients render as "unknown / pre- iter-257"; new workers send the resolved value (1, 2, 4, ...). # Wire-through - worker.rs: WorkerService.npu_pool_size populated from the env var at startup, surfaced via get_stats RPC. - transport.rs: StatsSnapshot.npu_pool_size field with #[serde(default)] so JSON consumers from old workers don't fail. - grpc_transport.rs: populated from proto resp on stats() RPC. # ADR refresh (also in this commit) - ADR-176 (HEF integration EPIC): added P6 row covering iter 234-237 pool measurement work + iter 256-257 observability layer. - ADR-178 (gap analysis): bumped Status from Proposed to Closed with a per-gap remediation table (8 gaps, 6 closed, 1 deferred, 2 tracked separately). Local verification: cargo check -p ruvector-hailo-cluster --bins (clean) cargo test -p ruvector-hailo-cluster --lib (114 passed) Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two iterations adding cluster-side observability for per-model + per-pool measurements, plus refreshing ADR-176/178 to record the iter-234..257 hailo work.
What ships
iter-256 — bench --prom
fingerprintlabelBench's textfile-collector output carried only
concurrencyas a label, so a Prometheus alert grouping by series couldn't tell a genuine throughput regression apart from a model swap. Now every metric carriesconcurrency="N",fingerprint="<hex>". Empty fingerprint (--allow-empty-fingerprint) renders asfingerprint=""rather than getting dropped, so the label set stays scrape-stable.rate(...) by (fingerprint)now gives one series per model — fingerprint changes are deploy events the operator already knew about, not noise.iter-257 — StatsResponse.npu_pool_size + ADR refresh
Backward-compatible proto3 add:
uint32 npu_pool_size = 10on StatsResponse. Old workers send 0 (proto3 default → "unknown / pre-iter-257"); new workers send the resolved value. Wired through worker → transport StatsSnapshot → grpc_transport.ADR refresh:
Test plan
cargo check -p ruvector-hailo-cluster --binscleancargo test -p ruvector-hailo-cluster --lib(114 passed)cargo test -p ruvector-hailo-cluster --test bench_cli(6 passed) — locks the new fingerprint label🤖 Generated with claude-flow