Skip to content

Commit 7f8f845

Browse files
devin-ai-integration[bot]devin-ai-integration[bot]cognition-team
authored
feat(chat): reorder harness markers and split compaction buckets (#222)
## Summary Reshapes the user-turn prompt wrapper and thread-background rendering so Claude Sonnet and GPT-5 treat the current instruction as authoritative and prior thread context as read-only reference material. Addresses the failure mode tracked in #221 and `getsentry/junior-prod#35`, where Junior drifts onto a narrowed-but-superseded ask from earlier in a thread. Changes in `packages/junior/src/chat/respond-helpers.ts` (`buildUserTurnText`): - Order (top → bottom): `<thread-background>`, `<session-context>`, `<turn-context>`, `<current-instruction priority="highest">` — `<current-instruction>` is always the final block, matching Anthropic's long-context guidance to place the active query last. - Drops legacy `<current-message>` / `<thread-conversation-context>` wrappers. - No explanatory prose inside markers — tag names carry the signal. Changes in `packages/junior/src/chat/services/conversation-memory.ts`: - `buildConversationContext` wraps each compaction in `<compaction index=… covered_messages=… created_at=…>` and each transcript entry in `<message index=… ts=… role=… author=… slack_ts=…>`, so each prior item is an individually addressable reference instead of a flat blob. - `summarizeConversationChunk` prompt now produces three fixed sections — `<active-asks>`, `<superseded-or-completed-asks>`, `<facts>` — so stale or already-acted-on asks stop reading as live constraints after compaction. Rationale and authoritative prior art (Anthropic long-context guide, OpenAI GPT-5 prompting guide, OpenAI Model Spec chain-of-command) are cited in #221. ## Review & Testing Checklist for Human - [ ] Sanity-check the new `buildUserTurnText` output shape against a real thread turn (e.g. local dev or an eval snapshot) and confirm the final tag emitted is `</current-instruction>` and `<thread-background>` precedes it. - [ ] Spot-check one compacted conversation in a real thread to confirm the summarizer is producing the three-bucket XML (active / superseded / facts) rather than a free-form paragraph. Because the summarizer is model-generated, the prompt change only shapes output — run against the production fast model to verify it complies. - [ ] Decide whether this should be gated behind an eval sweep on both Sonnet and GPT-5 gateway models before relying on the new marker shape for production traffic. This PR does not add such an eval. ### Notes - Intentionally preserved the `<thread-transcript>` / `<thread-compactions>` marker names; routing fixtures in `tests/unit/routing/subscribed-decision.test.ts` still reference them. - No runtime behavior change beyond the emitted prompt text; no new dependencies, no schema changes. Compaction storage format (`summary: string`) is unchanged — only the prompt that generates it is updated. - Pre-existing unit-test failure `tests/unit/services/turn-checkpoint.test.ts > reuses the latest stored transcript…` reproduces on `main` (requires `REDIS_URL`) and is unrelated to this PR. - Follow-up candidates (not in this PR): add an eval that exercises narrow-then-broaden instruction drift across a compacted thread; consider also marking the assistant's own prior tool calls with an `executed` flag in `<message>` wrappers. Link to Devin session: https://app.devin.ai/sessions/f46faf27a4354f7dab95abd8dfc50211 Requested by: @dcramer --------- Co-authored-by: devin-ai-integration[bot] <158243448+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Devin <devin@cognition.ai> Co-authored-by: Devin <devin-ai-integration[bot]@users.noreply.github.com>
1 parent 80673e4 commit 7f8f845

4 files changed

Lines changed: 73 additions & 33 deletions

File tree

packages/junior/src/chat/respond-helpers.ts

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,11 @@ export function summarizeMessageText(text: string): string {
143143
: normalized;
144144
}
145145

146-
/** Wrap user input with conversation context and observability metadata XML tags. */
146+
/**
147+
* Wrap the current user turn with self-describing marker blocks: background
148+
* first, current instruction last. Ordering matches long-context attention
149+
* guidance for Sonnet and GPT-5.
150+
*/
147151
export function buildUserTurnText(
148152
userInput: string,
149153
conversationContext?: string,
@@ -153,47 +157,48 @@ export function buildUserTurnText(
153157
},
154158
): string {
155159
const trimmedContext = conversationContext?.trim();
156-
const hasSessionContext = Boolean(metadata?.sessionContext?.conversationId);
157-
const hasTurnContext = Boolean(metadata?.turnContext?.traceId);
160+
const conversationId = metadata?.sessionContext?.conversationId;
161+
const traceId = metadata?.turnContext?.traceId;
158162

159-
if (!trimmedContext && !hasSessionContext && !hasTurnContext) {
163+
if (!trimmedContext && !conversationId && !traceId) {
160164
return userInput;
161165
}
162166

163-
const sections: string[] = [
164-
"<current-message>",
165-
userInput,
166-
"</current-message>",
167-
];
167+
const sections: string[] = [];
168168

169169
if (trimmedContext) {
170170
sections.push(
171-
"",
172-
"<thread-conversation-context>",
173-
"Use this context for continuity across prior thread turns.",
171+
"<thread-background>",
174172
trimmedContext,
175-
"</thread-conversation-context>",
173+
"</thread-background>",
174+
"",
176175
);
177176
}
178177

179-
if (metadata?.sessionContext?.conversationId) {
178+
if (conversationId) {
180179
sections.push(
181-
"",
182180
"<session-context>",
183-
`- gen_ai.conversation.id: ${metadata.sessionContext.conversationId}`,
181+
`- gen_ai.conversation.id: ${conversationId}`,
184182
"</session-context>",
183+
"",
185184
);
186185
}
187186

188-
if (metadata?.turnContext?.traceId) {
187+
if (traceId) {
189188
sections.push(
190-
"",
191189
"<turn-context>",
192-
`- trace_id: ${metadata.turnContext.traceId}`,
190+
`- trace_id: ${traceId}`,
193191
"</turn-context>",
192+
"",
194193
);
195194
}
196195

196+
sections.push(
197+
'<current-instruction priority="highest">',
198+
userInput,
199+
"</current-instruction>",
200+
);
201+
197202
return sections.join("\n");
198203
}
199204

packages/junior/src/chat/services/conversation-memory.ts

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import type {
99
} from "@/chat/state/conversation";
1010
import { toOptionalString } from "@/chat/coerce";
1111
import { logWarn, setSpanAttributes } from "@/chat/logging";
12+
import { escapeXml } from "@/chat/xml";
1213

1314
const CONTEXT_COMPACTION_TRIGGER_TOKENS = 9000;
1415
const CONTEXT_COMPACTION_TARGET_TOKENS = 7000;
@@ -152,6 +153,11 @@ export function markConversationMessage(
152153
updateConversationStats(conversation);
153154
}
154155

156+
/**
157+
* Render thread history as structured XML. Each compaction and message is
158+
* wrapped with index/ts metadata so the model can reference prior items
159+
* individually instead of treating the whole block as one flat narrative.
160+
*/
155161
export function buildConversationContext(
156162
conversation: ThreadConversationState,
157163
options: {
@@ -166,25 +172,31 @@ export function buildConversationContext(
166172
}
167173

168174
const lines: string[] = [];
175+
169176
if (conversation.compactions.length > 0) {
170177
lines.push("<thread-compactions>");
171178
for (const [index, compaction] of conversation.compactions.entries()) {
172179
lines.push(
173-
[
174-
`summary_${index + 1}:`,
175-
compaction.summary,
176-
`covered_messages: ${compaction.coveredMessageIds.length}`,
177-
`created_at: ${new Date(compaction.createdAtMs).toISOString()}`,
178-
].join(" "),
180+
` <compaction index="${index + 1}" covered_messages="${compaction.coveredMessageIds.length}" created_at="${new Date(compaction.createdAtMs).toISOString()}">`,
181+
compaction.summary,
182+
" </compaction>",
179183
);
180184
}
181-
lines.push("</thread-compactions>");
182-
lines.push("");
185+
lines.push("</thread-compactions>", "");
183186
}
184187

185188
lines.push("<thread-transcript>");
186-
for (const message of messages) {
187-
lines.push(renderConversationMessageLine(message, conversation));
189+
for (const [index, message] of messages.entries()) {
190+
const author = escapeXml(message.author?.userName ?? message.role);
191+
const ts = new Date(message.createdAtMs).toISOString();
192+
const slackTsAttr = message.meta?.slackTs
193+
? ` slack_ts="${escapeXml(message.meta.slackTs)}"`
194+
: "";
195+
lines.push(
196+
` <message index="${index + 1}" ts="${ts}" role="${message.role}" author="${author}"${slackTsAttr}>`,
197+
renderConversationMessageLine(message, conversation),
198+
" </message>",
199+
);
188200
}
189201
lines.push("</thread-transcript>");
190202
return lines.join("\n");
@@ -240,9 +252,14 @@ async function summarizeConversationChunk(
240252
role: "user",
241253
content: [
242254
"Summarize the following older Slack thread transcript segment for future assistant turns.",
243-
"Keep the summary factual and concise.",
244-
"Preserve decisions, commitments, constraints, locations, hiring criteria, and unresolved asks.",
245-
"Do not invent details.",
255+
"Keep the summary factual and concise. Do not invent details.",
256+
"",
257+
"Output exactly three XML sections in this order:",
258+
"<active-asks> one bullet per outstanding user ask that has not been narrowed, answered, or superseded by a later turn. Omit the section body if none. </active-asks>",
259+
"<superseded-or-completed-asks> one bullet per ask that has been rescoped, narrowed, answered, or already acted on in this segment. Include the replacement/outcome inline. Omit the section body if none. </superseded-or-completed-asks>",
260+
"<facts> one bullet per durable fact useful regardless of scope: names, ids, URLs, decisions, locations, preferences, constraints that remain true. Omit the section body if none. </facts>",
261+
"",
262+
"Do not output any text outside the three sections.",
246263
"",
247264
transcript,
248265
].join("\n"),
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import { describe, expect, it } from "vitest";
2+
import { buildUserTurnText } from "@/chat/respond-helpers";
3+
4+
describe("buildUserTurnText", () => {
5+
it("returns raw input when no context or metadata is provided", () => {
6+
expect(buildUserTurnText("hello")).toBe("hello");
7+
});
8+
});

packages/junior/tests/unit/services/conversation-memory.test.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
import { describe, expect, it } from "vitest";
2-
import { getThreadTitleSourceMessage } from "@/chat/services/conversation-memory";
2+
import {
3+
buildConversationContext,
4+
getThreadTitleSourceMessage,
5+
} from "@/chat/services/conversation-memory";
36
import { coerceThreadConversationState } from "@/chat/state/conversation";
47

58
describe("conversation memory title source", () => {
@@ -58,3 +61,10 @@ describe("conversation memory title source", () => {
5861
);
5962
});
6063
});
64+
65+
describe("buildConversationContext", () => {
66+
it("returns undefined for an empty conversation", () => {
67+
const conversation = coerceThreadConversationState({});
68+
expect(buildConversationContext(conversation)).toBeUndefined();
69+
});
70+
});

0 commit comments

Comments
 (0)