Skip to content

Commit d3c3352

Browse files
authored
Merge pull request #231 from Deep-CodeAI/feat/2905-ledger-misbehaviour
feat(#2905): fold agent-misbehaviour into the Merkle-chained audit ledger
2 parents aad83e2 + 6f900f7 commit d3c3352

8 files changed

Lines changed: 178 additions & 11 deletions

File tree

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,21 @@ All notable changes to Agents.KT are documented here. The format follows [Keep a
44

55
## [Unreleased]
66

7+
### Added — audit-ledger now records cross-cutting agent misbehaviour (#2905, epic #2882)
8+
9+
`agent.events.ledger(file)` previously chained only tool-action verdicts (`APPROVED` / `DENIED` /
10+
`HALLUCINATED`). It now folds in the misbehaviour signals that never flow through a tool body, so the **one
11+
tamper-evident Merkle chain** answers "what did agents try to do that they shouldn't, and what went wrong":
12+
a `PipelineEvent.BudgetThreshold` records a `BUDGET_EXCEEDED` row (the budget dimension + how much of the
13+
ceiling was used), and a `PipelineEvent.ErrorOccurred` records an `INFRA_ERROR` row (the exception **class**
14+
only — the message, which may carry secrets, is never stored). Two new `LedgerDecision` verdicts back these.
15+
Each row now exposes a derived `severity` (`INFO` / `WARN` / `CRITICAL`) and an `isMisbehaviour` flag —
16+
both a pure function of the verdict, so the hash schema is **unchanged and old ledgers still verify**. Read
17+
the misbehaviour rows back with `ToolAuditLedger.readMisbehaviour(path)`. An unrecognised verdict written by a
18+
newer version reads as misbehaviour at `WARN` rather than crashing the reader (forward-compatible, fail-safe).
19+
The writer stays unreachable through `ToolEnvironment` (#2883) — it only observes framework events, so a
20+
compromised tool cannot forge or rewrite its own row. 5 new tests.
21+
722
### Changed — default transient-network retry across all HTTP model providers (#4560)
823

924
The shared non-streaming transport (`HttpModelClientSupport.sendBounded`, used by Claude, OpenAI +

agents-kt-observability/src/main/kotlin/agents_engine/observability/AgentJsonlExports.kt

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,26 @@ class AgentJsonlExports internal constructor(private val agent: Agent<*, *>) {
1616
}
1717

1818
/**
19-
* #2886 — wire a tamper-evident [ToolAuditLedger] to this agent. Every tool action is
20-
* auto-recorded to an append-only, Merkle-chained file: a [PipelineEvent.ToolCalled] as
21-
* `APPROVED`, a [PipelineEvent.ToolDenied] as `DENIED` (with the reason), a
22-
* [PipelineEvent.ToolHallucinated] as `HALLUCINATED`. PII-safe (the result is hashed,
23-
* never stored). Returns the ledger so the caller can [ToolAuditLedger.verify] it later.
19+
* #2886 / #2905 — wire a tamper-evident [ToolAuditLedger] to this agent. Tool actions
20+
* AND cross-cutting agent-misbehaviour are auto-recorded to one append-only,
21+
* Merkle-chained file, so the same chain answers "what did agents try to do that they
22+
* shouldn't, and what went wrong":
23+
* - [PipelineEvent.ToolCalled] → `APPROVED`
24+
* - [PipelineEvent.ToolDenied] → `DENIED` (policy/interceptor block, with the reason)
25+
* - [PipelineEvent.ToolHallucinated] → `HALLUCINATED` (tool outside the skill allowlist)
26+
* - [PipelineEvent.BudgetThreshold] → `BUDGET_EXCEEDED` (a budget ceiling crossed) — #2905
27+
* - [PipelineEvent.ErrorOccurred] → `INFRA_ERROR` (a failure surfaced via `onError`) — #2905
2428
*
25-
* callId-keying of the denied/hallucinated rows lands once `PipelineEvent` carries the
26-
* callId (the approved rows already join via the AgentEvent layer) — #2886 follow-up.
29+
* PII-safe throughout: the tool result is hashed, never stored, and an error is recorded
30+
* by its exception *class* only — the message (which may carry secrets) stays out of the
31+
* row. Read the misbehaviour rows back with [ToolAuditLedger.readMisbehaviour]. Returns the
32+
* ledger so the caller can [ToolAuditLedger.verify] it later.
33+
*
34+
* The ledger writer stays unreachable through `ToolEnvironment` (#2883) — it only ever
35+
* observes framework events, so a compromised tool cannot forge or rewrite its own row.
36+
*
37+
* callId-keying of the non-tool rows lands once `PipelineEvent` carries the callId (the
38+
* approved rows already join via the AgentEvent layer) — #2886 follow-up.
2739
*/
2840
fun ledger(file: File): ToolAuditLedger {
2941
val ledger = ToolAuditLedger(file.toPath())
@@ -35,6 +47,18 @@ class AgentJsonlExports internal constructor(private val agent: Agent<*, *>) {
3547
ledger.record(event.toolName, LedgerDecision.DENIED, denialReason = event.reason)
3648
is PipelineEvent.ToolHallucinated ->
3749
ledger.record(event.requestedName, LedgerDecision.HALLUCINATED)
50+
is PipelineEvent.BudgetThreshold ->
51+
ledger.record(
52+
event.reason.name,
53+
LedgerDecision.BUDGET_EXCEEDED,
54+
denialReason = "${event.reason.name} budget at ${event.usedPercent} of limit",
55+
)
56+
is PipelineEvent.ErrorOccurred ->
57+
ledger.record(
58+
event.error::class.simpleName ?: "Throwable",
59+
LedgerDecision.INFRA_ERROR,
60+
denialReason = event.error::class.qualifiedName, // class only — message may carry PII
61+
)
3862
else -> Unit
3963
}
4064
}
Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,32 @@
11
package agents_engine.observability
22

3-
/** The decision recorded for a tool action in the [ToolAuditLedger]. */
4-
enum class LedgerDecision { APPROVED, DENIED, HALLUCINATED }
3+
/**
4+
* The decision recorded for an action in the [ToolAuditLedger].
5+
*
6+
* #2886 seeded the tool-action verdicts ([APPROVED] / [DENIED] / [HALLUCINATED]).
7+
* #2905 folds in the cross-cutting *agent-misbehaviour* signals that never flow
8+
* through a tool body — [BUDGET_EXCEEDED] (a budget ceiling crossed) and
9+
* [INFRA_ERROR] (a transport/parse/runtime failure surfaced via `onError`) — so the
10+
* one tamper-evident Merkle chain answers "what did agents try to do that they
11+
* shouldn't, and what went wrong." Each carries its [severity] and [isMisbehaviour]
12+
* as a fixed function of the verdict, so no new persisted column is needed.
13+
*/
14+
enum class LedgerDecision {
15+
APPROVED,
16+
DENIED,
17+
HALLUCINATED,
18+
BUDGET_EXCEEDED,
19+
INFRA_ERROR,
20+
;
21+
22+
/** True for every verdict except an authorized [APPROVED] tool call. */
23+
val isMisbehaviour: Boolean get() = this != APPROVED
24+
25+
/** Triage level for this verdict — see [LedgerSeverity]. */
26+
val severity: LedgerSeverity
27+
get() = when (this) {
28+
APPROVED -> LedgerSeverity.INFO
29+
HALLUCINATED, BUDGET_EXCEEDED, INFRA_ERROR -> LedgerSeverity.WARN
30+
DENIED -> LedgerSeverity.CRITICAL
31+
}
32+
}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
package agents_engine.observability
2+
3+
/**
4+
* #2905 — how alarming a ledger row is, for triage when reading back an audit trail.
5+
* Orthogonal to [LedgerDecision]: severity is a fixed function of the decision (see
6+
* [LedgerDecision.severity]), so it is *derived*, never a separate persisted column —
7+
* the Merkle hash schema is unchanged and old ledgers still verify.
8+
*
9+
* - [INFO] — normal, authorized activity (an approved tool call).
10+
* - [WARN] — contained misbehaviour or operational failure: the model overreached
11+
* (hallucinated tool), hit a resource ceiling, or a call failed — recoverable.
12+
* - [CRITICAL] — a guardrail actively blocked a forbidden action (a policy/interceptor
13+
* denial). This is the strongest "an agent tried to do something it shouldn't" signal,
14+
* which is exactly what the audit log exists to answer.
15+
*/
16+
enum class LedgerSeverity { INFO, WARN, CRITICAL }
Binary file not shown.

agents-kt-observability/src/main/kotlin/agents_engine/observability/ToolLedgerEntry.kt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,23 @@ data class ToolLedgerEntry(
1717
val prevHash: String,
1818
val entryHash: String,
1919
)
20+
21+
/**
22+
* The persisted [decision] string parsed back to a [LedgerDecision], or `null` for a
23+
* verdict written by a newer version than this reader knows (forward-compatible — an
24+
* unknown verdict never throws when reading an audit file).
25+
*/
26+
val ToolLedgerEntry.decisionType: LedgerDecision?
27+
get() = LedgerDecision.entries.firstOrNull { it.name == decision }
28+
29+
/**
30+
* #2905 — true when this row records agent misbehaviour (a denial, hallucinated call,
31+
* budget breach, or infra error) rather than an authorized [LedgerDecision.APPROVED]
32+
* action. An unrecognised verdict is treated as misbehaviour (fail-safe: surface it).
33+
*/
34+
val ToolLedgerEntry.isMisbehaviour: Boolean
35+
get() = decisionType?.isMisbehaviour ?: true
36+
37+
/** Triage level of this row, derived from its [decisionType] (unknown verdicts → [LedgerSeverity.WARN]). */
38+
val ToolLedgerEntry.severity: LedgerSeverity
39+
get() = decisionType?.severity ?: LedgerSeverity.WARN

agents-kt-observability/src/test/kotlin/agents_engine/observability/ToolAuditLedgerTest.kt

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,70 @@ class ToolAuditLedgerTest {
9797
assertEquals("syncTicket", parsed.toolName)
9898
}
9999

100+
@Test fun `misbehaviour verdicts chain alongside approved rows and the chain still verifies`() {
101+
val ledger = ToolAuditLedger(path)
102+
ledger.record("writeFile", LedgerDecision.APPROVED, result = "ok")
103+
ledger.record("readSecret", LedgerDecision.DENIED, denialReason = "outside policy glob")
104+
ledger.record("MAX_TOOL_CALLS", LedgerDecision.BUDGET_EXCEEDED, denialReason = "MAX_TOOL_CALLS at 1.0 of limit")
105+
ledger.record("ConnectException", LedgerDecision.INFRA_ERROR, denialReason = "java.net.ConnectException")
106+
assertTrue(ToolAuditLedger.verify(path).ok, "a chain mixing approved + misbehaviour rows must verify")
107+
}
108+
109+
@Test fun `readMisbehaviour returns every non-approved row and skips approved traffic`() {
110+
val ledger = ToolAuditLedger(path)
111+
ledger.record("a", LedgerDecision.APPROVED)
112+
ledger.record("b", LedgerDecision.DENIED, denialReason = "no")
113+
ledger.record("c", LedgerDecision.APPROVED)
114+
ledger.record("TOKENS", LedgerDecision.BUDGET_EXCEEDED)
115+
ledger.record("IOException", LedgerDecision.INFRA_ERROR)
116+
val mis = ToolAuditLedger.readMisbehaviour(path)
117+
assertEquals(listOf("b", "TOKENS", "IOException"), mis.map { it.toolName })
118+
assertTrue(mis.all { it.isMisbehaviour }, "every returned row is misbehaviour")
119+
}
120+
121+
@Test fun `decision drives the derived severity and misbehaviour view`() {
122+
val ledger = ToolAuditLedger(path)
123+
val approved = ledger.record("ok", LedgerDecision.APPROVED)
124+
val denied = ledger.record("blocked", LedgerDecision.DENIED, denialReason = "policy")
125+
val budget = ledger.record("DURATION", LedgerDecision.BUDGET_EXCEEDED)
126+
assertFalse(approved.isMisbehaviour)
127+
assertEquals(LedgerSeverity.INFO, approved.severity)
128+
assertEquals(LedgerSeverity.CRITICAL, denied.severity, "a guardrail block is the strongest audit signal")
129+
assertEquals(LedgerSeverity.WARN, budget.severity)
130+
assertEquals(LedgerDecision.BUDGET_EXCEEDED, budget.decisionType)
131+
}
132+
133+
@Test fun `an unknown future verdict reads as misbehaviour and is never silently dropped`() {
134+
// Forward-compat: a verdict written by a newer version must not crash the reader and,
135+
// fail-safe, must surface rather than hide — so it counts as misbehaviour at WARN.
136+
ToolAuditLedger(path).record("x", LedgerDecision.APPROVED)
137+
val tampered = Files.readAllLines(path).map { it.replace("\"APPROVED\"", "\"QUARANTINED_V2\"") }
138+
Files.write(path, tampered)
139+
val entry = ToolAuditLedger.read(path).single()
140+
assertEquals(null, entry.decisionType, "an unrecognised verdict parses to null, not an exception")
141+
assertTrue(entry.isMisbehaviour, "unknown verdicts surface as misbehaviour")
142+
assertEquals(LedgerSeverity.WARN, entry.severity)
143+
}
144+
145+
@Test fun `events ledger auto-records an infra error by exception class without leaking the message`() {
146+
val a = agent<String, String>("erroring") {
147+
model {
148+
ollama("llama3")
149+
client = ModelClient { _ -> throw IllegalStateException("SECRET_IN_MESSAGE") }
150+
}
151+
skills { skill<String, String>("s", "stub") { } }
152+
}
153+
a.events.ledger(path.toFile())
154+
runCatching { a("input") } // the error propagates; the ledger observes it on the way out
155+
156+
val text = Files.readString(path)
157+
assertFalse("SECRET_IN_MESSAGE" in text, "the exception message (possible PII) must never be stored")
158+
val infra = ToolAuditLedger.read(path).single { it.decision == "INFRA_ERROR" }
159+
assertEquals("IllegalStateException", infra.toolName)
160+
assertTrue(infra.isMisbehaviour)
161+
assertTrue(ToolAuditLedger.verify(path).ok, "the auto-written misbehaviour row must verify")
162+
}
163+
100164
@Test fun `events ledger auto-records an approved tool call and verifies`() {
101165
val responses = ArrayDeque(
102166
listOf<LlmResponse>(

docs/threat-model.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ val server = McpServer.from(agent) {
155155

156156
**Gaps you close yourself (today):**
157157
- **TLS termination and rate limiting.** Keep those at the gateway.
158-
- **Audit log retention.** The framework emits the rows — `agent.events.exportJsonl(...)` (#1914) writes append-only JSONL with `requestId` / `sessionId` / `manifestHash`, and `agent.events.ledger(file)` adds a tamper-evident Merkle chain. Retention, rotation, and chain-of-custody of those files are yours; a gateway log with client identity remains the complement at the edge.
158+
- **Audit log retention.** The framework emits the rows — `agent.events.exportJsonl(...)` (#1914) writes append-only JSONL with `requestId` / `sessionId` / `manifestHash`, and `agent.events.ledger(file)` adds a tamper-evident Merkle chain. That chain records authorized tool calls **and** cross-cutting misbehaviour in one place (#2905): policy/interceptor denials, hallucinated tool calls, budget breaches, and infra errors (by exception class, never the message) — read them back with `ToolAuditLedger.readMisbehaviour(...)`, each row carrying a derived `severity`. Retention, rotation, and chain-of-custody of those files are yours; a gateway log with client identity remains the complement at the edge.
159159

160160
**Verdict:** Agents.KT-as-shipped is the WRONG shape if your gateway can't take on these responsibilities. With a gateway that can, it works; without one, see anti-patterns below.
161161

@@ -222,7 +222,7 @@ This is the canonical status table — README, `SECURITY.md`, and `production-ha
222222
| Prompt-injection filtering || Yours (mitigations: `maxToolArgsBytes`, `untrustedOutput`, output wrapping) |
223223
| PII redaction in tool I/O || Yours (`onToolUse` hook) |
224224
| Permission manifest / capability graph + CI verify | ✓ shipped | `agents-kt` CLI `generate` / `inspect` / `verify`; `manifestHash` on runtime events |
225-
| JSONL audit export + tamper-evident ledger | ✓ shipped | `exportJsonl` (#1914) + `ToolAuditLedger` (Merkle-chained, `verify()`) |
225+
| JSONL audit export + tamper-evident ledger | ✓ shipped | `exportJsonl` (#1914) + `ToolAuditLedger` (Merkle-chained, `verify()`); records tool actions + misbehaviour — denials/hallucinations/budget breaches/infra errors (#2905), `readMisbehaviour()` |
226226
| `onBefore*` interceptors (proceed / replace / deny / substitute) | ✓ shipped | Runtime (#1907) |
227227
| Human-in-the-loop approval + resume | ✓ shipped | `humanApproval { }``ApprovalRequest``resumeWith(HumanDecision)` (#2489), fail-closed timeout default |
228228
| Fail-loud ambiguous skill routing | ✓ shipped (0.7.21, #3087) | `SkillRoutingException` — no silent first-match |

0 commit comments

Comments
 (0)