Replies: 3 comments
-
|
Minimal usage of mooncake_conductor:
|
Beta Was this translation helpful? Give feedback.
-
|
thanks @yejj710 I was able to fix and compile with patch: Lmk if you want me to add that change |
Beta Was this translation helpful? Give feedback.
-
|
also it looks like conductor requires tokenized input and the test uses specific python proxy to send requests to tokenizer before sending to conductor, which adds more hops and overhead. Is there any plans for optimization, or a router with internal tokenizer is required to have best performance with conductor. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@mich45208 we are trying to test kv-aware routing of XpYd on vllm-router for performance based on the conductor with vllm, lmcache, and mooncake.
Beta Was this translation helpful? Give feedback.
All reactions