Skip to content

feat(server): Adds a tag to RDB entry#6926

Draft
abhijat wants to merge 10 commits intomainfrom
abhijat/feat/tagged-chunk-rdb-format
Draft

feat(server): Adds a tag to RDB entry#6926
abhijat wants to merge 10 commits intomainfrom
abhijat/feat/tagged-chunk-rdb-format

Conversation

@abhijat
Copy link
Copy Markdown
Contributor

@abhijat abhijat commented Mar 19, 2026

WIP

Serializer on flush prefixes an envelope: opcode,tag,size to the chunk for baseline (key,value) entries.

The tag for baseline data is a monotonic increasing stream id of 4 bytes. A new one is assigned per SaveEntry call (so each key).

Journal and other non baseline types use a sentinel value and are not tagged but still stashed

To guard against interleaved (baseline - journal, baseline - baseline) entries caused by eg yield of large SaveEntry, a stash is added to serializer, and a new state: current_stream_id_

  • WriteJournalEntry sets this current stream id sentinel value up front
  • before yield (consume_fun_) we store the current stream id
  • after yield if the current stream id changed (journal entry added while fiber was yielded/another key got serialized) the memory buffer content is stashed with the correct stream id
  • when flushing the stash which already contains tagged chunks is prefixed into the current data in mem buffer (which is also tagged during flush)
  • The flush decision (size threshold) takes into account both mem buffer and the stash size
  • These operations in serializer are gated by the boolean flag from setting serialization_tagged_chunks

The compression of entries in stash is skipped. It might be possible to do this but currently the number_of_chunks_ field is saved and restored across stash, because the stash function uses PrepareFlush

TODO in this PR:

  • tests

@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch from 41cce0b to b002506 Compare March 19, 2026 08:50
@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 19, 2026

this is one alternative approach to #6906

@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 19, 2026

will add more details in #6831

@romange
Copy link
Copy Markdown
Collaborator

romange commented Mar 19, 2026

It's not true that we need it only for replication with stream_journal.

We may have interleaved streams if for example two fibers try to write values during backup. One of them is serializing a huge value and another - another entry, which it can slip in between the serialization of the huge value if we lift bi_value_m_

I wrote explicitly that the acceptance criteria is that rdb_tests (snapshotting) should pass with the new format.

@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch from b002506 to d300a9c Compare March 19, 2026 10:49
@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 4 times, most recently from 541cb7e to cde5f4a Compare March 20, 2026 12:00
@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 21, 2026

It might be better to tag on bucket ids (+ one fixed tag for journal) than just baseline. This way for example:

  • one bucket X is being iterated. it yields midway saving a big entry - fiber A
  • an ondbchange fires for bucket Y. as per https://github.com/dragonflydb/dragonfly/blob/main/docs/shard-serialization.md this is legal to write to serializer because bucket Y state is NotCovered. so it goes ahead and writes some data to serializer then yields mid SaveEntry (eg big hmap) - fiber B
  • fiber A resumes, writes more data to serializer -> memory buffer. finishes
  • fiber B resumes, writes rest of bucket Y to serializer. finishes

Now the serializer data looks like:

[XXXXYYYXXX] which is corrupt even though all tags are correctly marked baseline. If we tag by bucket id (a monotonic increasing 4 byte int) instead, and use the stashing system already in this PR, then the receiver can reassemble chunks for diff. buckets and apply them separately, as well as play journal entries correctly. We also need to mark end of a bucket by something like a 0 length chunk tagged by the bucket id (at the end of saving bucket)

For this to work the loader has to now maintain one loader per bucket though (max number of loaders at a time = max number of buckets in stream at a time), and the end marker can tell when a bucket is done and the loader can be torn down or reused.

@romange
Copy link
Copy Markdown
Collaborator

romange commented Mar 21, 2026

but how come bucket ids are related to it? we just need to reassemble a single value. a bucket holds multiple values. we can have a unique tag per key. a key can be such a tag, for example.

@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 21, 2026

but how come bucket ids are related to it? we just need to reassemble a single value. a bucket holds multiple values. we can have a unique tag per key. a key can be such a tag, for example.

There are two reasons I thought of buckets

  • In this PR I stashed the content of the mem buf in serializer into a tagged chunk, when the tag changes. so if the tag changes per bucket, and if a bucket is serialized without yield then it gets sent as a single chunk, no stashing or tagging. With keys there will be a chunk which will be stashed for each key in each bucket. But it will still work.
  • I looked at the bucket commands produced by SerializeBucket as a sequence of the stream that combined together and applied, but a key works the same way.

Looking at the loader side, because we already have support for incremental loading of keys using now_chunked_, it might be much easier to reassemble if the tag is mapped to key ( and db or any other information which might make it unique).

The main thing that will change in the loader is

  • right now it assumes in LoadKeyValPair that a key's content is contiguous in the stream.
  • with tagged chunks, this is no longer true, so the LoadKeyValPair loop needs an additional condition to stop: if chunk size is exhausted, and store the partially read key in the now_chunked_ map and process the remaining stream.
  • Later when a section of the stream is seen for the same key again then the key is picked from the map and appended to, which already happens in FromOpaque etc

So with key as tag (or a counter that maps to key) the end chunk marker is also not needed. The incremental parsing in LoadConfig will already take care of that.

@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 21, 2026

Also if the baseline entries are tagged by key, journal entries do not need to be tagged. They are already defined by opcode and never split. So they will be treated distinctly from entries tagged on key by the loader correctly without chunk header.

@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 21, 2026

Changes for loader to do next in this PR, rough steps:

  • in the main loader loop, if RDB_OPCODE_TAGGED_CHUNK seen, read stream id, payload size.
  • Fall through to loading kv pair, check stream state map (new map keyed by stream id)
  • If stream id not in stream state map, load key, val both. Cut off loadkvpair loop at payload size (update using mem buf inputlen?).
  • Store the partial entry state (pending read remaining etc) in map
  • If stream id in state map, dont read key, but read partial value (similar to LoadKeyValPair - extract a helper), use same CreateObjectOnShard etc for incremental parsing with some changes needed to update stream state map.
  • Once read finishes (after potentially many chunks) CreateObjectOnShard will save the entry. Remove it from stream state.

@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 5 times, most recently from a5a81cb to 9efb38b Compare March 22, 2026 08:39
Comment thread src/server/rdb_load.cc Outdated
@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 5 times, most recently from cbc4da0 to b7d6925 Compare March 23, 2026 07:12
@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 23, 2026

The lower level read functions in rdb load (ReadSet etc) also need to stop at the payload size, not just "n" items. we already have the condition in loadkeyvalpair etc. The same conditions need to be pushed down into the lower layers too

@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 3 times, most recently from bb5d11f to ebb14b3 Compare March 23, 2026 16:16
@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 5 times, most recently from 0bb5174 to eee8dc4 Compare March 25, 2026 09:58
@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 25, 2026

At this point the one crucial thing to add to this PR is tests that actually interleave data across keys/buckets ie preempt and write

@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 3 times, most recently from b56a290 to 28a50d1 Compare March 26, 2026 11:14
@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 26, 2026

in the latest version:

  • stream id is now optional - only used for entries which split on Push->Flush->Yield
  • serializer self assigns stream id on potential split - caller does not need to
  • those entries which fit in one chunk and finish without reaching flush boundary are not tagged
  • journal entries/other non data entries like journal offset etc are not tagged

@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch from 1f3b918 to 1d38527 Compare March 27, 2026 07:51
@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch from 1d38527 to bf1687f Compare March 27, 2026 08:01
abhijat added 2 commits March 27, 2026 15:36
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch from bf1687f to debd811 Compare March 27, 2026 10:06
@abhijat
Copy link
Copy Markdown
Contributor Author

abhijat commented Mar 27, 2026

new test interleaves multiple keys forcefully and inserts journal entries and offset commands around each chunk of key

abhijat added 2 commits March 31, 2026 10:36
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
@abhijat abhijat force-pushed the abhijat/feat/tagged-chunk-rdb-format branch 4 times, most recently from ee4617d to d3e1760 Compare April 2, 2026 15:26
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants