Add multi-GPU vLLM support for CAA steering hooks

## Problem

CAA steering vectors can be applied in a single-process model, but vLLM tensor-parallel inference keeps model layers inside worker-local model replicas. Without worker-local hook installation, a CAA activation-add hook is not reliably applied across all tensor-parallel workers.

## Proposed Change

Add a lightweight runtime hook utility for applying already-computed CAA vectors to vLLM-loaded models, including tensor-parallel worker support through vLLM worker RPC.

The intended contribution is deliberately small:

- install CAA activation-add hooks on selected decoder layers inside each vLLM worker-local model
- clear installed hooks after generation
- read hook call/configuration stats from each worker
- provide focused hook-only tests and lightweight documentation

This is not proposed as a new EasyEdit editing method under `easyeditor/models`. It is runtime support for applying existing CAA vectors during vLLM inference.

## Related PR

PR: #694

## Validation Plan

- hook-only unit tests with fake vLLM-style workers
- syntax check for the PR surface
- optional real-GPU smoke command documented for local validation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multi-GPU vLLM support for CAA steering hooks #695

Problem

Proposed Change

Related PR

Validation Plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add multi-GPU vLLM support for CAA steering hooks #695

Description

Problem

Proposed Change

Related PR

Validation Plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions