Skip to content

Native serialization for MultiMutableVamana index#289

Merged
razdoburdin merged 5 commits intointel:dev/razdoburdin_streamingfrom
razdoburdin:streaming/multi_dynamic_vamana
Mar 23, 2026
Merged

Native serialization for MultiMutableVamana index#289
razdoburdin merged 5 commits intointel:dev/razdoburdin_streamingfrom
razdoburdin:streaming/multi_dynamic_vamana

Conversation

@razdoburdin
Copy link
Copy Markdown
Contributor

@razdoburdin razdoburdin commented Mar 16, 2026

This PR introduce native serialization for MultiMutableVamana index.
Should be merged after: #286

Main changes are:

  1. New overload of svs::index::vamana::auto_multi_dynamic_assemble required for direct deserialization accepts lazy loaders and call them in a flexible order to cover legacy serialized models.
  2. Added related tests.
  3. supports_saving flag is keeped false, as far as it isn't used.
  4. MultiMutableVamana doesn't have an orchestrator. So no changes on this side.

@razdoburdin razdoburdin marked this pull request as draft March 16, 2026 09:15
@razdoburdin razdoburdin marked this pull request as ready for review March 20, 2026 11:13
@razdoburdin razdoburdin requested a review from rfsaliev March 20, 2026 11:14
Copy link
Copy Markdown
Member

@rfsaliev rfsaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With comment

Comment thread include/svs/index/vamana/multi.h Outdated
Comment on lines +633 to +653
consolidate();
compact();

// Since data is in order of external ids,
// convert a map of external ids to label types into a sorted vector of labels based
// on external ids.
std::vector<std::pair<external_id_type, label_type>> ext_lab_vec(
external_to_label_.begin(), external_to_label_.end()
);
std::sort(ext_lab_vec.begin(), ext_lab_vec.end(), [](const auto& a, const auto& b) {
return a.first < b.first;
});

size_t num_labels = ext_lab_vec.size();
std::vector<label_type> labels(num_labels);
std::transform(
ext_lab_vec.begin(),
ext_lab_vec.end(),
labels.begin(),
[](const auto& ext_lab) { return ext_lab.second; }
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to extract this preparation code to a separate method and reuse it both in tis method and in save(const filesystem::path&, const filesystem::path&, const filesystem::path)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@razdoburdin razdoburdin merged commit 5e52e82 into intel:dev/razdoburdin_streaming Mar 23, 2026
16 of 19 checks passed
razdoburdin added a commit that referenced this pull request Apr 16, 2026
This PR adds native stream serialization to all SVS index types, as an
alternative to the existing (legacy) directory-based serialization. It
allow to avoid filesystem round-trips of the data. The native
serialization doesn't require from the stream to be seek able, so no
additional restrictions were introduced.

See the following PR for details:
#280,
#281,
#285,
#286,
#289,
#292,
#294,
#296,
#299

Main changes are:
1. A CRTP base `Archiver` extracts binary I/O primitives (`write_size`,
`read_size`, `write_name`, `read_name`, `read_from_istream`) from
`DirectoryArchiver`. `DirectoryArchiver` and new `StreamArchiver` class
inherit from `Archiver`. `StreamArchiver` has its own magic number
("SVS_STRM") to distinguish native streams from directory archives.
2. The monolithic `Writer` is split via CRTP with two derived classes:
`FileWriter` owns an `std::ofstream`, writes a header, flushes on
destructor, `StreamWriter` wraps an external `std::ostream&`, no
header/lifecycle management. This allows `io::save(data, os)` to write
vector data directly to any stream.
3. The `save(stream)` in orchestrator `Impl` classes no longer does
temp-dir->pack. Instead it directly calls `impl().save(stream)`.
4. The dispatching between new (native) and old (legacy) deserialization
is made at the orchestrators. `Deserializer::build(is)` reads the magic
number, exposes `is_native()` to choose path.

---------

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Rafik Saliev <rafik.f.saliev@intel.com>
Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants