Problem
CAA steering vectors can be applied in a single-process model, but vLLM tensor-parallel inference keeps model layers inside worker-local model replicas. Without worker-local hook installation, a CAA activation-add hook is not reliably applied across all tensor-parallel workers.
Proposed Change
Add a lightweight runtime hook utility for applying already-computed CAA vectors to vLLM-loaded models, including tensor-parallel worker support through vLLM worker RPC.
The intended contribution is deliberately small:
- install CAA activation-add hooks on selected decoder layers inside each vLLM worker-local model
- clear installed hooks after generation
- read hook call/configuration stats from each worker
- provide focused hook-only tests and lightweight documentation
This is not proposed as a new EasyEdit editing method under easyeditor/models. It is runtime support for applying existing CAA vectors during vLLM inference.
Related PR
PR: #694
Validation Plan
- hook-only unit tests with fake vLLM-style workers
- syntax check for the PR surface
- optional real-GPU smoke command documented for local validation
Problem
CAA steering vectors can be applied in a single-process model, but vLLM tensor-parallel inference keeps model layers inside worker-local model replicas. Without worker-local hook installation, a CAA activation-add hook is not reliably applied across all tensor-parallel workers.
Proposed Change
Add a lightweight runtime hook utility for applying already-computed CAA vectors to vLLM-loaded models, including tensor-parallel worker support through vLLM worker RPC.
The intended contribution is deliberately small:
This is not proposed as a new EasyEdit editing method under
easyeditor/models. It is runtime support for applying existing CAA vectors during vLLM inference.Related PR
PR: #694
Validation Plan