How to use Mooncake-Conductor on vLLM #1967

yejj710 · 2026-04-24T03:40:21Z

yejj710
Apr 24, 2026
Collaborator

@mich45208 we are trying to test kv-aware routing of XpYd on vllm-router for performance based on the conductor with vllm, lmcache, and mooncake.

yejj710 · 2026-04-24T04:02:50Z

yejj710
Apr 24, 2026
Collaborator Author

Minimal usage of mooncake_conductor:

compile code: cd mooncake-conductor/conductor-ctrl/ && go mod tidy && go build -o mooncake_conductor main.go
run mooncake-conductor: ./mooncake_conductor (will start a http-server on port 13333)
register kv-publihser(via /register api) details can be see in [RFC]: Mooncake KV-Store Indexer API Standardization #1403
query kv-hit status (via /query api)

0 replies

mich45208 · 2026-04-27T21:14:27Z

mich45208
Apr 27, 2026

thanks @yejj710
I tried to recompile whole mooncake stack on this branch and got 2 compile errors:

/tmp/Mooncake/mooncake-store/src/master_service.cpp:1726:13: error:
 'MetadataAccessor' was not declared in this scope; did you mean 'MetadataAccessorRW'?
 1726 |             MetadataAccessor accessor(this, key);
      |             ^~~~~~~~~~~~~~~~
      |             MetadataAccessorRW

/tmp/Mooncake/mooncake-store/include/master_service.h:725:13: error:
 'metadata' was not declared in this scope
  725 |             metadata.VisitReplicas(
      |             ^~~~~~~~

I was able to fix and compile with patch:

diff --git a/mooncake-store/include/master_service.h b/mooncake-store/include/master_service.h
index 20c5dd5..b2f0e62 100644
--- a/mooncake-store/include/master_service.h
+++ b/mooncake-store/include/master_service.h
@@ -722,7 +722,7 @@ class MasterService {

             std::vector<Replica::Descriptor> replica_list;
             replica_list.reserve(replicas_.size());
-            metadata.VisitReplicas(
+            VisitReplicas(
                 &Replica::fn_is_completed, [&replica_list](const Replica& replica) {
                     replica_list.emplace_back(replica.get_descriptor());
                 });
diff --git a/mooncake-store/src/master_service.cpp b/mooncake-store/src/master_service.cpp
index 1adcb03..826c11b 100644
--- a/mooncake-store/src/master_service.cpp
+++ b/mooncake-store/src/master_service.cpp
@@ -1723,7 +1723,7 @@ auto MasterService::NotifyOffloadSuccess(
         auto res = AddReplica(client_id, key, replica);

         if (publisher) {
-            MetadataAccessor accessor(this, key);
+            MetadataAccessorRW accessor(this, key);
             if (!accessor.Exists()) {
                 LOG(ERROR) << "key=" << key << ", error=object_not_found";
                 return tl::make_unexpected(ErrorCode::OBJECT_NOT_FOUND);

Lmk if you want me to add that change

0 replies

mich45208 · 2026-04-28T00:58:35Z

mich45208
Apr 28, 2026

also it looks like conductor requires tokenized input and the test uses specific python proxy to send requests to tokenizer before sending to conductor, which adds more hops and overhead. Is there any plans for optimization, or a router with internal tokenizer is required to have best performance with conductor.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use Mooncake-Conductor on vLLM #1967

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to use Mooncake-Conductor on vLLM #1967

Uh oh!

yejj710 Apr 24, 2026 Collaborator

Replies: 3 comments

Uh oh!

yejj710 Apr 24, 2026 Collaborator Author

Uh oh!

Uh oh!

mich45208 Apr 27, 2026

Uh oh!

mich45208 Apr 28, 2026

yejj710
Apr 24, 2026
Collaborator

yejj710
Apr 24, 2026
Collaborator Author

mich45208
Apr 27, 2026

mich45208
Apr 28, 2026