You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/oss/deepagents/context-engineering.mdx
+18-2Lines changed: 18 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -381,6 +381,22 @@ Content offloading happens when tool call inputs or results exceed a token thres
381
381
382
382

383
383
384
+
### Multimodal inputs
385
+
386
+
Deep Agents supports multimodal inputs, such as images returned by `read_file` or provided in messages, but the built-in context management mechanisms are primarily text and message-history oriented. They do not resize images, lower image resolution, or generate reusable visual embeddings.
387
+
388
+
For multimodal workloads, keep large media out of active message history when possible:
389
+
390
+
- Store images, screenshots, and charts in a filesystem backend or external object store, then pass file paths or URLs through messages.
391
+
- Prefer references over base64-encoded image blocks in long-running conversations.
392
+
- If a tool produces an image, have the tool save the image and return a concise text description plus a path or URL.
393
+
- Use subagents for image-heavy inspection work so the main agent receives a compact text result instead of every multimodal intermediate step.
394
+
- Tune summarization thresholds or provide a custom token counter when your model provider charges many tokens for images.
395
+
396
+
Offloading large tool inputs and results only measures text content. Non-text blocks, including images, are preserved in the replacement message rather than compressed. A message that contains only an image will not be offloaded because of image size alone.
397
+
398
+
Summarization replaces older messages with a text summary once those messages fall outside the preserved recent context. Any images in the summarized partition are no longer sent as active image blocks after summarization. The conversation history file written to the backend is a textual record, not a media artifact store, so store important images separately if the agent needs to inspect them again later.
399
+
384
400
### Summarization
385
401
386
402
:::js
@@ -394,9 +410,9 @@ When the context size crosses the model's context window limit (for example 85%
394
410
This process has two components:
395
411
396
412
-**In-context summary**: An LLM generates a structured summary of the conversation including session intent, artifacts created, and next steps—which replaces the full conversation history in the agent's working memory.
397
-
-**Filesystem preservation**: The complete, original conversation messages are written to the filesystem as a canonical record.
413
+
-**Filesystem preservation**: A text rendering of the original conversation messages is written to the filesystem as a canonical record.
398
414
399
-
This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover specific details when needed (via filesystem search).
415
+
This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover text details when needed (via filesystem search).
400
416
401
417

Copy file name to clipboardExpand all lines: src/oss/langchain/middleware/built-in.mdx
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,6 +60,10 @@ Automatically summarize conversation history when approaching token limits, pres
60
60
- Multi-turn dialogues with extensive history.
61
61
- Applications where preserving full conversation context matters.
62
62
63
+
<Note>
64
+
Summarization is text-oriented context compression. It does not resize, downsample, or otherwise compress image/audio/video payloads. Recent messages retained by `keep` still include their original multimodal blocks, while older multimodal messages that are summarized are represented only by the generated text summary. For image-heavy applications, store media in a filesystem or object store and pass URLs or file references through message history.
0 commit comments