Skip to content

Add multi-GPU vLLM support for CAA steering hooks #695

Description

@linmou

Problem

CAA steering vectors can be applied in a single-process model, but vLLM tensor-parallel inference keeps model layers inside worker-local model replicas. Without worker-local hook installation, a CAA activation-add hook is not reliably applied across all tensor-parallel workers.

Proposed Change

Add a lightweight runtime hook utility for applying already-computed CAA vectors to vLLM-loaded models, including tensor-parallel worker support through vLLM worker RPC.

The intended contribution is deliberately small:

  • install CAA activation-add hooks on selected decoder layers inside each vLLM worker-local model
  • clear installed hooks after generation
  • read hook call/configuration stats from each worker
  • provide focused hook-only tests and lightweight documentation

This is not proposed as a new EasyEdit editing method under easyeditor/models. It is runtime support for applying existing CAA vectors during vLLM inference.

Related PR

PR: #694

Validation Plan

  • hook-only unit tests with fake vLLM-style workers
  • syntax check for the PR surface
  • optional real-GPU smoke command documented for local validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions