Merge pull request #231 from Deep-CodeAI/feat/2905-ledger-misbehaviour

Skobeltsyn · web-flow · commit d3c335232d7e · 2026-06-17T20:32:43.000+03:00
feat(#2905): fold agent-misbehaviour into the Merkle-chained audit ledger
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,21 @@ All notable changes to Agents.KT are documented here. The format follows [Keep a
 
 ## [Unreleased]
 
+### Added — audit-ledger now records cross-cutting agent misbehaviour (#2905, epic #2882)
+
+`agent.events.ledger(file)` previously chained only tool-action verdicts (`APPROVED` / `DENIED` /
+`HALLUCINATED`). It now folds in the misbehaviour signals that never flow through a tool body, so the **one
+tamper-evident Merkle chain** answers "what did agents try to do that they shouldn't, and what went wrong":
+a `PipelineEvent.BudgetThreshold` records a `BUDGET_EXCEEDED` row (the budget dimension + how much of the
+ceiling was used), and a `PipelineEvent.ErrorOccurred` records an `INFRA_ERROR` row (the exception **class**
+only — the message, which may carry secrets, is never stored). Two new `LedgerDecision` verdicts back these.
+Each row now exposes a derived `severity` (`INFO` / `WARN` / `CRITICAL`) and an `isMisbehaviour` flag —
+both a pure function of the verdict, so the hash schema is **unchanged and old ledgers still verify**. Read
+the misbehaviour rows back with `ToolAuditLedger.readMisbehaviour(path)`. An unrecognised verdict written by a
+newer version reads as misbehaviour at `WARN` rather than crashing the reader (forward-compatible, fail-safe).
+The writer stays unreachable through `ToolEnvironment` (#2883) — it only observes framework events, so a
+compromised tool cannot forge or rewrite its own row. 5 new tests.
+
 ### Changed — default transient-network retry across all HTTP model providers (#4560)
 
 The shared non-streaming transport (`HttpModelClientSupport.sendBounded`, used by Claude, OpenAI +
diff --git a/agents-kt-observability/src/main/kotlin/agents_engine/observability/AgentJsonlExports.kt b/agents-kt-observability/src/main/kotlin/agents_engine/observability/AgentJsonlExports.kt
@@ -16,14 +16,26 @@ class AgentJsonlExports internal constructor(private val agent: Agent<*, *>) {
     }
 
     /**
-     * #2886 — wire a tamper-evident [ToolAuditLedger] to this agent. Every tool action is
-     * auto-recorded to an append-only, Merkle-chained file: a [PipelineEvent.ToolCalled] as
-     * `APPROVED`, a [PipelineEvent.ToolDenied] as `DENIED` (with the reason), a
-     * [PipelineEvent.ToolHallucinated] as `HALLUCINATED`. PII-safe (the result is hashed,
-     * never stored). Returns the ledger so the caller can [ToolAuditLedger.verify] it later.
+     * #2886 / #2905 — wire a tamper-evident [ToolAuditLedger] to this agent. Tool actions
+     * AND cross-cutting agent-misbehaviour are auto-recorded to one append-only,
+     * Merkle-chained file, so the same chain answers "what did agents try to do that they
+     * shouldn't, and what went wrong":
+     * - [PipelineEvent.ToolCalled] → `APPROVED`
+     * - [PipelineEvent.ToolDenied] → `DENIED` (policy/interceptor block, with the reason)
+     * - [PipelineEvent.ToolHallucinated] → `HALLUCINATED` (tool outside the skill allowlist)
+     * - [PipelineEvent.BudgetThreshold] → `BUDGET_EXCEEDED` (a budget ceiling crossed) — #2905
+     * - [PipelineEvent.ErrorOccurred] → `INFRA_ERROR` (a failure surfaced via `onError`) — #2905
      *
-     * callId-keying of the denied/hallucinated rows lands once `PipelineEvent` carries the
-     * callId (the approved rows already join via the AgentEvent layer) — #2886 follow-up.
+     * PII-safe throughout: the tool result is hashed, never stored, and an error is recorded
+     * by its exception *class* only — the message (which may carry secrets) stays out of the
+     * row. Read the misbehaviour rows back with [ToolAuditLedger.readMisbehaviour]. Returns the
+     * ledger so the caller can [ToolAuditLedger.verify] it later.
+     *
+     * The ledger writer stays unreachable through `ToolEnvironment` (#2883) — it only ever
+     * observes framework events, so a compromised tool cannot forge or rewrite its own row.
+     *
+     * callId-keying of the non-tool rows lands once `PipelineEvent` carries the callId (the
+     * approved rows already join via the AgentEvent layer) — #2886 follow-up.
      */
     fun ledger(file: File): ToolAuditLedger {
         val ledger = ToolAuditLedger(file.toPath())
@@ -35,6 +47,18 @@ class AgentJsonlExports internal constructor(private val agent: Agent<*, *>) {
                     ledger.record(event.toolName, LedgerDecision.DENIED, denialReason = event.reason)
                 is PipelineEvent.ToolHallucinated ->
                     ledger.record(event.requestedName, LedgerDecision.HALLUCINATED)
+                is PipelineEvent.BudgetThreshold ->
+                    ledger.record(
+                        event.reason.name,
+                        LedgerDecision.BUDGET_EXCEEDED,
+                        denialReason = "${event.reason.name} budget at ${event.usedPercent} of limit",
+                    )
+                is PipelineEvent.ErrorOccurred ->
+                    ledger.record(
+                        event.error::class.simpleName ?: "Throwable",
+                        LedgerDecision.INFRA_ERROR,
+                        denialReason = event.error::class.qualifiedName, // class only — message may carry PII
+                    )
                 else -> Unit
             }
         }
diff --git a/agents-kt-observability/src/main/kotlin/agents_engine/observability/LedgerDecision.kt b/agents-kt-observability/src/main/kotlin/agents_engine/observability/LedgerDecision.kt
@@ -1,4 +1,32 @@
 package agents_engine.observability
 
-/** The decision recorded for a tool action in the [ToolAuditLedger]. */
-enum class LedgerDecision { APPROVED, DENIED, HALLUCINATED }
+/**
+ * The decision recorded for an action in the [ToolAuditLedger].
+ *
+ * #2886 seeded the tool-action verdicts ([APPROVED] / [DENIED] / [HALLUCINATED]).
+ * #2905 folds in the cross-cutting *agent-misbehaviour* signals that never flow
+ * through a tool body — [BUDGET_EXCEEDED] (a budget ceiling crossed) and
+ * [INFRA_ERROR] (a transport/parse/runtime failure surfaced via `onError`) — so the
+ * one tamper-evident Merkle chain answers "what did agents try to do that they
+ * shouldn't, and what went wrong." Each carries its [severity] and [isMisbehaviour]
+ * as a fixed function of the verdict, so no new persisted column is needed.
+ */
+enum class LedgerDecision {
+    APPROVED,
+    DENIED,
+    HALLUCINATED,
+    BUDGET_EXCEEDED,
+    INFRA_ERROR,
+    ;
+
+    /** True for every verdict except an authorized [APPROVED] tool call. */
+    val isMisbehaviour: Boolean get() = this != APPROVED
+
+    /** Triage level for this verdict — see [LedgerSeverity]. */
+    val severity: LedgerSeverity
+        get() = when (this) {
+            APPROVED -> LedgerSeverity.INFO
+            HALLUCINATED, BUDGET_EXCEEDED, INFRA_ERROR -> LedgerSeverity.WARN
+            DENIED -> LedgerSeverity.CRITICAL
+        }
+}
diff --git a/agents-kt-observability/src/main/kotlin/agents_engine/observability/LedgerSeverity.kt b/agents-kt-observability/src/main/kotlin/agents_engine/observability/LedgerSeverity.kt
@@ -0,0 +1,16 @@
+package agents_engine.observability
+
+/**
+ * #2905 — how alarming a ledger row is, for triage when reading back an audit trail.
+ * Orthogonal to [LedgerDecision]: severity is a fixed function of the decision (see
+ * [LedgerDecision.severity]), so it is *derived*, never a separate persisted column —
+ * the Merkle hash schema is unchanged and old ledgers still verify.
+ *
+ * - [INFO] — normal, authorized activity (an approved tool call).
+ * - [WARN] — contained misbehaviour or operational failure: the model overreached
+ *   (hallucinated tool), hit a resource ceiling, or a call failed — recoverable.
+ * - [CRITICAL] — a guardrail actively blocked a forbidden action (a policy/interceptor
+ *   denial). This is the strongest "an agent tried to do something it shouldn't" signal,
+ *   which is exactly what the audit log exists to answer.
+ */
+enum class LedgerSeverity { INFO, WARN, CRITICAL }
diff --git a/agents-kt-observability/src/main/kotlin/agents_engine/observability/ToolAuditLedger.kt b/agents-kt-observability/src/main/kotlin/agents_engine/observability/ToolAuditLedger.kt
diff --git a/agents-kt-observability/src/main/kotlin/agents_engine/observability/ToolLedgerEntry.kt b/agents-kt-observability/src/main/kotlin/agents_engine/observability/ToolLedgerEntry.kt
@@ -17,3 +17,23 @@ data class ToolLedgerEntry(
     val prevHash: String,
     val entryHash: String,
 )
+
+/**
+ * The persisted [decision] string parsed back to a [LedgerDecision], or `null` for a
+ * verdict written by a newer version than this reader knows (forward-compatible — an
+ * unknown verdict never throws when reading an audit file).
+ */
+val ToolLedgerEntry.decisionType: LedgerDecision?
+    get() = LedgerDecision.entries.firstOrNull { it.name == decision }
+
+/**
+ * #2905 — true when this row records agent misbehaviour (a denial, hallucinated call,
+ * budget breach, or infra error) rather than an authorized [LedgerDecision.APPROVED]
+ * action. An unrecognised verdict is treated as misbehaviour (fail-safe: surface it).
+ */
+val ToolLedgerEntry.isMisbehaviour: Boolean
+    get() = decisionType?.isMisbehaviour ?: true
+
+/** Triage level of this row, derived from its [decisionType] (unknown verdicts → [LedgerSeverity.WARN]). */
+val ToolLedgerEntry.severity: LedgerSeverity
+    get() = decisionType?.severity ?: LedgerSeverity.WARN
diff --git a/agents-kt-observability/src/test/kotlin/agents_engine/observability/ToolAuditLedgerTest.kt b/agents-kt-observability/src/test/kotlin/agents_engine/observability/ToolAuditLedgerTest.kt
@@ -97,6 +97,70 @@ class ToolAuditLedgerTest {
         assertEquals("syncTicket", parsed.toolName)
     }
 
+    @Test fun `misbehaviour verdicts chain alongside approved rows and the chain still verifies`() {
+        val ledger = ToolAuditLedger(path)
+        ledger.record("writeFile", LedgerDecision.APPROVED, result = "ok")
+        ledger.record("readSecret", LedgerDecision.DENIED, denialReason = "outside policy glob")
+        ledger.record("MAX_TOOL_CALLS", LedgerDecision.BUDGET_EXCEEDED, denialReason = "MAX_TOOL_CALLS at 1.0 of limit")
+        ledger.record("ConnectException", LedgerDecision.INFRA_ERROR, denialReason = "java.net.ConnectException")
+        assertTrue(ToolAuditLedger.verify(path).ok, "a chain mixing approved + misbehaviour rows must verify")
+    }
+
+    @Test fun `readMisbehaviour returns every non-approved row and skips approved traffic`() {
+        val ledger = ToolAuditLedger(path)
+        ledger.record("a", LedgerDecision.APPROVED)
+        ledger.record("b", LedgerDecision.DENIED, denialReason = "no")
+        ledger.record("c", LedgerDecision.APPROVED)
+        ledger.record("TOKENS", LedgerDecision.BUDGET_EXCEEDED)
+        ledger.record("IOException", LedgerDecision.INFRA_ERROR)
+        val mis = ToolAuditLedger.readMisbehaviour(path)
+        assertEquals(listOf("b", "TOKENS", "IOException"), mis.map { it.toolName })
+        assertTrue(mis.all { it.isMisbehaviour }, "every returned row is misbehaviour")
+    }
+
+    @Test fun `decision drives the derived severity and misbehaviour view`() {
+        val ledger = ToolAuditLedger(path)
+        val approved = ledger.record("ok", LedgerDecision.APPROVED)
+        val denied = ledger.record("blocked", LedgerDecision.DENIED, denialReason = "policy")
+        val budget = ledger.record("DURATION", LedgerDecision.BUDGET_EXCEEDED)
+        assertFalse(approved.isMisbehaviour)
+        assertEquals(LedgerSeverity.INFO, approved.severity)
+        assertEquals(LedgerSeverity.CRITICAL, denied.severity, "a guardrail block is the strongest audit signal")
+        assertEquals(LedgerSeverity.WARN, budget.severity)
+        assertEquals(LedgerDecision.BUDGET_EXCEEDED, budget.decisionType)
+    }
+
+    @Test fun `an unknown future verdict reads as misbehaviour and is never silently dropped`() {
+        // Forward-compat: a verdict written by a newer version must not crash the reader and,
+        // fail-safe, must surface rather than hide — so it counts as misbehaviour at WARN.
+        ToolAuditLedger(path).record("x", LedgerDecision.APPROVED)
+        val tampered = Files.readAllLines(path).map { it.replace("\"APPROVED\"", "\"QUARANTINED_V2\"") }
+        Files.write(path, tampered)
+        val entry = ToolAuditLedger.read(path).single()
+        assertEquals(null, entry.decisionType, "an unrecognised verdict parses to null, not an exception")
+        assertTrue(entry.isMisbehaviour, "unknown verdicts surface as misbehaviour")
+        assertEquals(LedgerSeverity.WARN, entry.severity)
+    }
+
+    @Test fun `events ledger auto-records an infra error by exception class without leaking the message`() {
+        val a = agent<String, String>("erroring") {
+            model {
+                ollama("llama3")
+                client = ModelClient { _ -> throw IllegalStateException("SECRET_IN_MESSAGE") }
+            }
+            skills { skill<String, String>("s", "stub") { } }
+        }
+        a.events.ledger(path.toFile())
+        runCatching { a("input") } // the error propagates; the ledger observes it on the way out
+
+        val text = Files.readString(path)
+        assertFalse("SECRET_IN_MESSAGE" in text, "the exception message (possible PII) must never be stored")
+        val infra = ToolAuditLedger.read(path).single { it.decision == "INFRA_ERROR" }
+        assertEquals("IllegalStateException", infra.toolName)
+        assertTrue(infra.isMisbehaviour)
+        assertTrue(ToolAuditLedger.verify(path).ok, "the auto-written misbehaviour row must verify")
+    }
+
     @Test fun `events ledger auto-records an approved tool call and verifies`() {
         val responses = ArrayDeque(
             listOf<LlmResponse>(
diff --git a/docs/threat-model.md b/docs/threat-model.md
@@ -155,7 +155,7 @@ val server = McpServer.from(agent) {
 
 **Gaps you close yourself (today):**
 - **TLS termination and rate limiting.** Keep those at the gateway.
-- **Audit log retention.** The framework emits the rows — `agent.events.exportJsonl(...)` (#1914) writes append-only JSONL with `requestId` / `sessionId` / `manifestHash`, and `agent.events.ledger(file)` adds a tamper-evident Merkle chain. Retention, rotation, and chain-of-custody of those files are yours; a gateway log with client identity remains the complement at the edge.
+- **Audit log retention.** The framework emits the rows — `agent.events.exportJsonl(...)` (#1914) writes append-only JSONL with `requestId` / `sessionId` / `manifestHash`, and `agent.events.ledger(file)` adds a tamper-evident Merkle chain. That chain records authorized tool calls **and** cross-cutting misbehaviour in one place (#2905): policy/interceptor denials, hallucinated tool calls, budget breaches, and infra errors (by exception class, never the message) — read them back with `ToolAuditLedger.readMisbehaviour(...)`, each row carrying a derived `severity`. Retention, rotation, and chain-of-custody of those files are yours; a gateway log with client identity remains the complement at the edge.
 
 **Verdict:** Agents.KT-as-shipped is the WRONG shape if your gateway can't take on these responsibilities. With a gateway that can, it works; without one, see anti-patterns below.
 
@@ -222,7 +222,7 @@ This is the canonical status table — README, `SECURITY.md`, and `production-ha
 | Prompt-injection filtering | ✗ | Yours (mitigations: `maxToolArgsBytes`, `untrustedOutput`, output wrapping) |
 | PII redaction in tool I/O | ✗ | Yours (`onToolUse` hook) |
 | Permission manifest / capability graph + CI verify | ✓ shipped | `agents-kt` CLI `generate` / `inspect` / `verify`; `manifestHash` on runtime events |
-| JSONL audit export + tamper-evident ledger | ✓ shipped | `exportJsonl` (#1914) + `ToolAuditLedger` (Merkle-chained, `verify()`) |
+| JSONL audit export + tamper-evident ledger | ✓ shipped | `exportJsonl` (#1914) + `ToolAuditLedger` (Merkle-chained, `verify()`); records tool actions + misbehaviour — denials/hallucinations/budget breaches/infra errors (#2905), `readMisbehaviour()` |
 | `onBefore*` interceptors (proceed / replace / deny / substitute) | ✓ shipped | Runtime (#1907) |
 | Human-in-the-loop approval + resume | ✓ shipped | `humanApproval { }` → `ApprovalRequest` → `resumeWith(HumanDecision)` (#2489), fail-closed timeout default |
 | Fail-loud ambiguous skill routing | ✓ shipped (0.7.21, #3087) | `SkillRoutingException` — no silent first-match |