Skip to content

Commit 3d59d3a

Browse files
ruvnetruvnet
andcommitted
feat(hailo): expose npu_pool_size via StatsResponse + ADR refresh (iter 257)
Surface the resolved RUVECTOR_NPU_POOL_SIZE through the gRPC StatsResponse so cluster-side observability can differentiate single-pipeline vs pool=N measurements. # Proto change (backward-compatible) StatsResponse gains `uint32 npu_pool_size = 10`. Old workers send 0 (proto3 default), which clients render as "unknown / pre- iter-257"; new workers send the resolved value (1, 2, 4, ...). # Wire-through - worker.rs: WorkerService.npu_pool_size populated from the env var at startup, surfaced via get_stats RPC. - transport.rs: StatsSnapshot.npu_pool_size field with #[serde(default)] so JSON consumers from old workers don't fail. - grpc_transport.rs: populated from proto resp on stats() RPC. # ADR refresh (also in this commit) - ADR-176 (HEF integration EPIC): added P6 row covering iter 234-237 pool measurement work + iter 256-257 observability layer. - ADR-178 (gap analysis): bumped Status from Proposed to Closed with a per-gap remediation table (8 gaps, 6 closed, 1 deferred, 2 tracked separately). Local verification: cargo check -p ruvector-hailo-cluster --bins (clean) cargo test -p ruvector-hailo-cluster --lib (114 passed) Co-Authored-By: claude-flow <ruv@ruv.net>
1 parent 00c5978 commit 3d59d3a

6 files changed

Lines changed: 52 additions & 3 deletions

File tree

crates/ruvector-hailo-cluster/proto/embedding.proto

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,4 +96,12 @@ message StatsResponse {
9696
// both as Prometheus gauges so a sudden spike in denials is grep-able.
9797
uint64 rate_limit_denials = 8; // ResourceExhausted returned since boot
9898
uint64 rate_limit_tracked_peers = 9; // distinct peers seen since boot
99+
// Iter 257 — surface RUVECTOR_NPU_POOL_SIZE the worker resolved at
100+
// startup. Lets the cluster-side stats CLI + bench --prom output
101+
// differentiate "single-pipeline worker" vs "pool=N worker" measurements.
102+
// 1 = single-pipeline default (iter-235 baseline); >=2 enables the
103+
// iter-237 HefEmbedderPool. Backward-compatible proto3 add: old
104+
// clients see this as 0 ("unknown"), new clients see the resolved
105+
// value.
106+
uint32 npu_pool_size = 10;
99107
}

crates/ruvector-hailo-cluster/src/bin/worker.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,11 @@ struct WorkerService {
213213
/// affecting any legitimate caller (iter-179 streaming sweep
214214
/// peaked at b=16). Env: RUVECTOR_MAX_BATCH_SIZE.
215215
max_batch_size: usize,
216+
/// Iter 257 — resolved NPU pool size (RUVECTOR_NPU_POOL_SIZE).
217+
/// Surfaced via StatsResponse.npu_pool_size so cluster-side
218+
/// observability can differentiate single-pipeline vs pool=N
219+
/// measurements.
220+
npu_pool_size: u32,
216221
/// Process start time, for uptime reporting in GetStats.
217222
start: Instant,
218223
/// Atomic counters surfaced via GetStats.
@@ -450,6 +455,8 @@ impl Embedding for WorkerService {
450455
uptime_seconds: self.start.elapsed().as_secs(),
451456
rate_limit_denials: self.rate_limit_denials.load(Ordering::Relaxed),
452457
rate_limit_tracked_peers: tracked_peers,
458+
// Iter 257 — surface the resolved RUVECTOR_NPU_POOL_SIZE.
459+
npu_pool_size: self.npu_pool_size,
453460
}))
454461
}
455462
}
@@ -695,6 +702,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
695702
rate_limiter: Arc::clone(&rate_limiter),
696703
rate_limit_denials: Arc::clone(&rate_limit_denials),
697704
max_batch_size,
705+
// Iter 257 — surface the resolved pool size via gRPC StatsResponse.
706+
// Cast usize → u32 is safe — pool sizes are bounded to single
707+
// digits in practice (RAM cost; see iter-239 measurement table).
708+
npu_pool_size: u32::try_from(npu_pool_size).unwrap_or(u32::MAX),
698709
start: Instant::now(),
699710
embed_ok: AtomicU64::new(0),
700711
embed_err: AtomicU64::new(0),

crates/ruvector-hailo-cluster/src/grpc_transport.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,10 @@ impl EmbeddingTransport for GrpcTransport {
343343
uptime: Duration::from_secs(resp.uptime_seconds),
344344
rate_limit_denials: resp.rate_limit_denials,
345345
rate_limit_tracked_peers: resp.rate_limit_tracked_peers,
346+
// Iter 257 — populate from proto. Pre-iter-257 workers
347+
// serialise this as 0 (proto3 default), which the
348+
// consumer renders as "unknown pool size" / "old worker".
349+
npu_pool_size: resp.npu_pool_size,
346350
})
347351
})
348352
}

crates/ruvector-hailo-cluster/src/transport.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,12 @@ pub struct StatsSnapshot {
144144
/// since boot. 0 = limiter disabled.
145145
#[serde(default)]
146146
pub rate_limit_tracked_peers: u64,
147+
/// Iter 257 — RUVECTOR_NPU_POOL_SIZE the worker resolved at startup.
148+
/// 1 = single-pipeline default (iter-235 baseline); >=2 = pool=N
149+
/// (iter-237 HefEmbedderPool). 0 = old worker without the field
150+
/// populated (pre-iter-257).
151+
#[serde(default)]
152+
pub npu_pool_size: u32,
147153
}
148154

149155
fn serialize_duration_us<S: serde::Serializer>(

docs/adr/ADR-176-hef-integration-epic.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ phases shipped + hardware-validated end-to-end on cognitum-v0 (Pi 5
2727
| P5b | 168 | Cache + NPU bench — 100% hit ⇒ **15.86 M/sec** (226,000×) |
2828
| P5b | 169 | HEF release + `download-encoder-hef.sh` (adoption unblocked) |
2929
| P5b | 170 | Saturation test C=100 60s — **no OOM, tonic backpressure works** |
30+
| P6 | 234-237 | `HefEmbedderPool` (multi-pipeline) — **measured: NPU-bound 70 RPS ceiling holds across pool sizes** but pool=2 cuts p50 23% under multi-bridge concurrent load. iter-237 deploy default pool=2 |
31+
| P6 | 256-257 | bench `--prom` carries `fingerprint` label; StatsResponse exposes `npu_pool_size` for cluster-side observability |
3032

3133
**Real Pi 5 measurements** (cluster-bench, concurrency=4, 15s,
3234
HEF worker on 50051 via systemd):

docs/adr/ADR-178-ruvector-ruview-hailo-integration-gap-analysis.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,27 @@ branch: hailo-backend
1212

1313
## Status
1414

15-
**Proposed.** Planning ADR. No code lands here — output is a graded gap
16-
inventory plus a remediation plan sized to the existing iter cadence
17-
(213 iters across ~5 days).
15+
**Closed (iter 257).** All HIGH+MEDIUM gaps remediated; G (Pi 4
16+
measurement) deferred without a Pi 4 in lab; long-form C/D (CSI
17+
pose semantics + downstream cluster consumer) tracked as separate
18+
multi-month ADRs out of this branch's scope.
19+
20+
| Gap | Severity | Status | Closed by |
21+
|-----|----------|--------|-----------|
22+
| A — ruvllm-bridge no deploy artifacts | HIGH | closed | iter 215 |
23+
| B — `EmbeddingProvider` not impl'd | HIGH | closed | iter 218 (path dep + impl) |
24+
| C — CSI bridge dropping I/Q (short) | MEDIUM | closed | iter 217 (doc-only) |
25+
| C — CSI bridge dropping I/Q (long) | MEDIUM | tracked separately | future ADR |
26+
| D — no downstream cluster consumer (short) | MEDIUM | closed | iter 221 (example) |
27+
| D — mcp-brain client (long) | MEDIUM | tracked separately | future ADR |
28+
| E — hailo crates excluded from workspace | MEDIUM | closed | iter 219 |
29+
| F — ADR-167 status stratigraphy | MEDIUM | closed | iter 217 |
30+
| G — Pi 4 throughput unmeasured | LOW | deferred | needs Pi 4 hardware |
31+
| H — `install-bridge.sh` misnamed | LOW | closed | iter 216 |
32+
33+
Original (planning) text below; output is a graded gap inventory
34+
plus a remediation plan sized to the iter cadence (213 iters
35+
across ~5 days at the time the ADR was first written).
1836

1937
## 1. Context
2038

0 commit comments

Comments
 (0)