fix: InMemoryDocumentStore.load_from_disk corrupts blob and sparse_embedding fields by adhavan18 · Pull Request #11634 · deepset-ai/haystack

adhavan18 · 2026-06-15T09:16:10Z

Problem

load_from_disk uses Document(**doc) to reconstruct documents. But save_to_disk serialises via Document.to_dict(flatten=False), which converts nested dataclass fields to plain dicts (blob → ByteStream.to_dict(), sparse_embedding → SparseEmbedding.to_dict()). The plain constructor doesn't reverse this, so those fields come back as raw dicts. Any access to document.blob.data or document.sparse_embedding.indices raises AttributeError: 'dict' object has no attribute 'data'.

Fix

Replace Document(**doc) with Document.from_dict(doc), which is the documented inverse of to_dict and correctly restores nested fields.

Test

Added round-trip test: save documents with blob and sparse_embedding, reload, verify fields are proper dataclass instances.

… instead of Document constructor save_to_disk serialises documents with Document.to_dict(flatten=False), which converts nested dataclass fields to plain dicts: blob -> ByteStream.to_dict() (a plain dict) sparse_embedding -> SparseEmbedding.to_dict() (a plain dict) load_from_disk previously reconstructed documents with Document(**doc), which passes those plain dicts directly to the constructor without reversing the serialisation. The fields were loaded as raw dicts instead of the proper ByteStream / SparseEmbedding instances. Downstream effects: - AttributeError: 'dict' object has no attribute 'data' on any access to document.blob.data (e.g. DocumentToImageContent). - doc.to_dict() / doc == other both fail with the same error. - A save -> load -> save round-trip is impossible. Fix: replace Document(**doc) with Document.from_dict(doc), which is the documented inverse of to_dict and correctly restores ByteStream, SparseEmbedding, and any other nested dataclass fields. Adds a regression test that exercises the full save/load round-trip with both a blob and a sparse_embedding, asserts the correct types are restored, and verifies that save_to_disk on the loaded store works without error. Fixes deepset-ai#11593

vercel · 2026-06-15T09:16:16Z

@adhavan18 is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-06-15T09:16:19Z

All committers have signed the CLA.

julian-risch · 2026-06-15T19:43:44Z

@adhavan18 Thank you for opening this PR. Another PR addressing the same issue is already under review #11594

adhavan18 added 2 commits June 15, 2026 14:42

chore: add release note for load_from_disk blob/sparse_embedding fix

bf0c860

adhavan18 requested a review from a team as a code owner June 15, 2026 09:16

adhavan18 requested review from julian-risch and removed request for a team June 15, 2026 09:16

github-actions Bot added the topic:tests label Jun 15, 2026

julian-risch closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: InMemoryDocumentStore.load_from_disk corrupts blob and sparse_embedding fields#11634

fix: InMemoryDocumentStore.load_from_disk corrupts blob and sparse_embedding fields#11634
adhavan18 wants to merge 2 commits into
deepset-ai:mainfrom
adhavan18:fix/load-from-disk-corrupts-blob-sparse-embedding

adhavan18 commented Jun 15, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 15, 2026

Uh oh!

CLAassistant commented Jun 15, 2026 •

edited

Loading

Uh oh!

julian-risch commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adhavan18 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented Jun 15, 2026

Uh oh!

CLAassistant commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julian-risch commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adhavan18 commented Jun 15, 2026 •

edited

Loading

CLAassistant commented Jun 15, 2026 •

edited

Loading