You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(diagnostics): count all token types (input, output, cached, reasoning) (#213)
## Summary
The turn-diagnostics usage extractor was under-counting tokens for two
reasons:
1. The key-alias list only recognised
`input_tokens`/`output_tokens`/`total_tokens` style names, so the pi-ai
`AssistantMessage.usage` shape (`input`, `output`, `cacheRead`,
`cacheWrite`, `totalTokens`) was only matching on `totalTokens`.
Cache-read, cache-write, and reasoning tokens were dropped on the floor.
2. When a turn produced multiple assistant messages (tool calls →
another model call → final answer), the extractor used `.find((v) => v
!== undefined)` and took the **first** message's usage instead of
summing across the turn.
The Slack footer also computed total tokens as `inputTokens +
outputTokens` only, which missed cached/cache-creation/reasoning tokens
even when individual counters were available.
### Changes
- `packages/junior/src/chat/usage.ts` — extend `AgentTurnUsage` with
`cachedInputTokens`, `cacheCreationTokens`, and `reasoningTokens`.
Diagnostics now carry every counter the provider reports as its own
field so renderers can choose how to present them.
- `packages/junior/src/chat/logging.ts` — `extractGenAiUsageSummary`
now:
- recognises pi-ai aliases (`input`, `output`, `cacheRead`,
`cacheWrite`) alongside the previous OpenAI/Anthropic/Gemini aliases;
- extracts each field per-source and **sums across sources**, so
multi-message turns report aggregate usage.
- `packages/junior/src/chat/slack/footer.ts` — render the `Tokens`
footer item as the sum of every reported component counter (`input +
output + cachedInput + cacheCreation + reasoning`). Falls back to
`totalTokens` only when no component counters were reported, since
providers disagree on whether `totalTokens` includes cached tokens.
- `packages/junior/src/chat/respond.ts` — detect "has usage" by checking
any field instead of hard-coding the old three.
- New unit tests in
`tests/unit/logging/extract-gen-ai-usage-summary.test.ts` and additional
cases in `tests/unit/slack/footer.test.ts`.
## Review & Testing Checklist for Human
- [ ] Verify on a real Slack turn that the `Tokens` footer value now
reflects cached + cache-creation tokens (e.g. a turn against an
Anthropic model that hits prompt caching).
- [ ] Confirm downstream consumers of `AssistantReply.diagnostics.usage`
(logs, metrics, evals) handle the new optional fields correctly.
- [ ] Sanity-check that summing `totalTokens` across sources is
acceptable; if any call site currently expects `totalTokens` to be a
single-message value rather than a turn aggregate, that assumption
changes with this PR.
### Notes
- `totalTokens` is still preserved as an individual field. We prefer the
sum of component counters when any are present because pi-ai's provider
adapters disagree on whether their `totalTokens` already includes
`cacheRead` (openai-completions adds it, openai-responses passes the
provider value through). Summing components avoids both under- and
over-counting.
- Reasoning tokens are captured if a provider surfaces them as a
top-level `reasoning_tokens`/`reasoningTokens` key. pi-ai currently
folds reasoning tokens into `output` for the OpenAI completions path, so
`reasoningTokens` will often remain undefined — no double counting.
Link to Devin session:
https://app.devin.ai/sessions/dcea113d0cba43448157973f8f4b7105
Requested by: @dcramer
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Devin <devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: David Cramer <david@sentry.io>
0 commit comments