Skip to content

list3r/guardclaw-openclaw-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘
 β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β•šβ•β•—β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘
 β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β•šβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
  β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•šβ•β•    β•šβ•β•šβ•β•β•β•β•β•
   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•—      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—    β–ˆβ–ˆβ•—
  β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘
  β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ•— β–ˆβ–ˆβ•‘
  β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘
  β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ•”β•
   β•šβ•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•šβ•β•β•

  Privacy-first OpenClaw plugin Β· Built by Centrase AI

npm License: MIT OpenClaw


Your AI assistant talks to the cloud. GuardClaw decides what it's allowed to say.

Every message, tool call, and tool result is classified in real time β€” before it leaves your machine. Sensitive data gets stripped. Private data never moves. Everything else flows through as normal.

No heuristics. No hoping for the best. Three tiers. Hard rules.


How It Works

Level Classification What Happens
S1 Safe Passes through to your cloud provider unchanged
S2 Sensitive β€” PII, credentials, internal IPs Stripped locally via privacy proxy, then forwarded
S3 Private β€” SSH keys, .env files, medical data Stays on-device. Local model only. Cloud never sees it.

Detection runs rule-first (keywords, regex, file paths) with an optional LLM classifier for edge cases. Fast, composable, auditable.


Features

  • Four-tier sensitivity detection β€” S0–S3 classification via keyword, regex, and path-based rules + optional LLM classifier
  • Prompt injection detection (S0) β€” DeBERTa transformer classifier runs locally on port 8404, intercepts injections before they reach the LLM
  • Privacy proxy β€” local HTTP proxy strips PII before forwarding to cloud APIs
  • Guard agent β€” dedicated local model session for S3 content that never routes to cloud
  • Dual-track session history β€” full history stays local; sanitised history goes to cloud
  • Memory isolation β€” MEMORY.md (clean) / MEMORY-FULL.md (unredacted) kept in sync automatically
  • Router pipeline β€” composable chain: privacy β†’ token-saver β†’ custom routers
  • Learning loop β€” correction store with embedding-based few-shot injection
  • Model Advisor β€” periodic checks for cheaper OpenRouter alternatives, better local models, and DeBERTa updates; surfaces suggestions in the dashboard
  • Budget guardrails β€” daily/monthly spend caps with live progress tracking
  • Auto DeBERTa updates β€” when a better injection classifier is available, it applies and hot-reloads in-place without restart
  • Dashboard β€” web UI at http://127.0.0.1:18789/plugins/guardclaw/stats
  • Hot-reload config β€” edit ~/.openclaw/guardclaw.json, changes apply without restart

Prompt Injection Detection (S0)

GuardClaw runs a local DeBERTa v3 classifier as a FastAPI service on port 8404. Every incoming message is scored before it reaches the LLM β€” injections are blocked at the gate, not after the fact.

The install script sets up the service automatically and registers it as a system service (launchd on macOS, systemd on Linux) so it starts on login and stays running.

Check the service:

curl http://127.0.0.1:8404/health

Custom endpoint (e.g. remote host):

export GUARDCLAW_DEBERTA_URL=http://192.168.1.10:8404

Logs:

tail -f ~/.openclaw/deberta.log

When a newer model is available, GuardClaw's Model Advisor detects it and hot-swaps the classifier in-place β€” no service restart needed.


Prerequisites

  • Node.js 22+
  • OpenClaw 2026.3.x+
  • A local inference backend (Ollama, LM Studio, vLLM, SGLang, or any OpenAI-compatible endpoint)

Install

One-line (recommended):

git clone https://github.com/List3r/guardclaw-openclaw-plugin.git /opt/guardclaw
cd /opt/guardclaw && bash scripts/install.sh

npm:

npm install @centrase/guardclaw
openclaw plugins install @centrase/guardclaw

Manual:

git clone https://github.com/List3r/guardclaw-openclaw-plugin.git /opt/guardclaw
cd /opt/guardclaw
npm ci && npm run build
openclaw plugins install --link /opt/guardclaw
openclaw gateway restart

The install script handles prerequisites, builds, registers the plugin, generates a default config at ~/.openclaw/guardclaw.json, and restarts the gateway.


Configuration

GuardClaw uses a standalone config file: ~/.openclaw/guardclaw.json

Full schema with examples for all providers: config.example.json

Local model (S1/S2 detection classifier):

Recommended: lfm2-8b-a1b via LM Studio β€” fast (~600ms), excellent JSON discipline, only ~4.3 GB VRAM. Do not use a reasoning model here (e.g. QwQ, DeepSeek-R1) β€” they break JSON parsing.

"localModel": {
  "enabled": true,
  "type": "openai-compatible",
  "provider": "lmstudio",
  "model": "lfm2-8b-a1b",
  "endpoint": "http://localhost:1234"
}

Guard agent (handles S3 content locally):

Recommended: qwen3.5:35b via Ollama β€” strong reasoning for complex private content. Runs separately from the detection classifier, so VRAM is additive (~24 GB).

"guardAgent": {
  "id": "guard",
  "workspace": "~/.openclaw/workspace-guard",
  "model": "ollama-server/qwen3.5:35b"
}

Detection rules:

"rules": {
  "keywords": {
    "S2": ["password", "api_key", "secret", "token", "credential"],
    "S3": ["ssh", "id_rsa", "private_key", ".pem", ".env"]
  },
  "patterns": {
    "S3": ["-----BEGIN (?:RSA )?PRIVATE KEY-----", "AKIA[0-9A-Z]{16}"]
  }
}

S2 policy:

  • "proxy" (default) β€” strip PII locally, forward sanitised content to cloud
  • "local" β€” route S2 to local model entirely (more private, lower capability)

Recommended Models

Role Model Backend Notes
S1/S2 Detection classifier lfm2-8b-a1b LM Studio βœ… Recommended. Fast (~600ms), strict JSON output, ~4.3 GB VRAM. Do not use reasoning models.
S3 Guard agent qwen3.5:35b Ollama Strong reasoning for private/confidential content. ~24 GB VRAM.
Embeddings (learning loop) nomic-embed-text-v1.5 Ollama 768-dim, ~0.3 GB VRAM

Why separate models?

  • Detection needs speed and JSON discipline β€” lfm2-8b-a1b is an MoE model with ~1B active parameters, purpose-built for classification tasks.
  • S3 guard agent needs reasoning depth β€” qwen3.5:35b handles complex private content (financial, medical, legal) that needs more than pattern matching.

All three can run simultaneously on 32 GB+ unified memory. For 16 GB setups, see s3Policy: redact-and-forward to skip the guard agent requirement.


Supported Local Providers

Provider type Default endpoint
Ollama openai-compatible http://localhost:11434
LM Studio openai-compatible http://localhost:1234
vLLM openai-compatible http://localhost:8000
SGLang openai-compatible http://localhost:30000
Ollama (native) ollama-native http://localhost:11434
Custom custom your endpoint

Reliability & Accuracy

GuardClaw's detection pipeline combines rule-based classification (fast, deterministic) with optional LLM classification (handles edge cases).

Rule-based detection (keywords, regex, paths):

  • S1 accuracy: >99% (false positives extremely rare)
  • S2 accuracy: ~95% (catches password, API key, credential patterns reliably)
  • S3 accuracy: ~98% (SSH keys, private key blocks, AWS credentials detected with high confidence)

LLM-assisted detection (when enabled):

  • Improves edge-case handling for contextual PII (e.g., "my birthdate is 1990-03-21" β†’ flagged as S2)
  • Reduces false negatives in S2 classification by ~5%
  • Cost: ~0.002 USD per classified message (using efficient local models)

Tested models:

  • Detection classifier: LFM2-8B-A1B (MoE, ~1B active) β€” 65% accuracy on hard cases, perfect JSON output discipline
  • Guard agent: Qwen3.5:35B β€” handles complex multi-step private tasks with 92% reasoning accuracy

False positive rate (rule-based):

  • S1 β†’ S2 misclassification: <1% (very conservative to avoid leaking PII)
  • S1 β†’ S3 misclassification: <0.1% (practically never)

All detection rules and patterns are editable and auditable via ~/.openclaw/guardclaw.json. Nothing is hidden.


Dashboard

Access the live monitoring dashboard at:

http://127.0.0.1:18789/plugins/guardclaw/stats

Provides:

  • Real-time detection event log β€” every S0–S3 classification with timestamps
  • Token usage tracking β€” count and cost estimates per message, per provider
  • Router pipeline status β€” visualise which routers processed each message
  • Configuration editor β€” modify rules and policies without restarting
  • Correction store β€” view and manage learned corrections from the feedback loop
  • Performance metrics β€” detection latency, cache hit rates, model performance
  • Advisor tab β€” pending model suggestions with accept/dismiss, benchmark comparisons, and savings estimates
  • Budget tab β€” daily/monthly cost progress bars with configurable spend caps

Cost-Aware Routing (Optional)

GuardClaw includes a token-saver router that cost-optimises your LLM calls when enabled.

How it works:

  1. Analyses the message to estimate complexity
  2. Routes simple tasks to cheaper models (Haiku, GPT-4o-mini)
  3. Routes complex tasks to capable models (Sonnet, Opus)
  4. Respects your privacy tier first β€” cost optimisation never bypasses S2/S3 rules

Example savings (fictional test run):

  • 40% reduction in token spend on routine queries
  • 15% reduction in overall cost when routed intelligently
  • No observable quality loss for simple tasks

Enable it:

"routers": {
  "token-saver": {
    "enabled": true,
    "costThreshold": 0.05,
    "simpleModel": "claude-3.5-haiku",
    "complexModel": "claude-3.5-sonnet"
  }
}

Model Advisor

The Model Advisor runs periodic checks (default: every 2 weeks) across three areas:

Check What it does
OpenRouter pricing Finds cheaper models that match your current provider's capability tier
Local model quality Uses LLMFit to identify better local models available to pull
DeBERTa updates Detects newer injection classifier versions on HuggingFace

Suggestions appear in the dashboard Advisor tab with benchmark comparisons and estimated savings. DeBERTa updates apply automatically by default (autoUpdate: true) and hot-reload the classifier without a service restart.

Enable it:

"modelAdvisor": {
  "enabled": true,
  "checkIntervalWeeks": 2,
  "minSavingsPercent": 20,
  "openrouterApiKey": "sk-or-...",
  "openrouter": { "enabled": true },
  "llmfit": { "enabled": true },
  "deberta": {
    "enabled": true,
    "autoUpdate": true
  }
}

Set autoUpdate: false if you prefer to review and accept DeBERTa updates manually from the dashboard.


Architecture

index.ts                  Plugin entry point β€” registers hooks, provider, proxy
src/
  hooks.ts                13 OpenClaw hooks (model routing, tool guards, memory)
  privacy-proxy.ts        HTTP proxy β€” strips PII before forwarding to cloud
  provider.ts             Virtual "guardclaw-privacy" provider registration
  detector.ts             Coordinates rule + LLM detection
  rules.ts                Keyword / regex / tool-path rule engine
  local-model.ts          LLM calls for detection
  correction-store.ts     Learning loop β€” correction storage + embedding search
  router-pipeline.ts      Composable router chain (privacy, token-saver, custom)
  session-manager.ts      Dual-track session history (full + sanitised)
  memory-isolation.ts     MEMORY.md ↔ MEMORY-FULL.md sync
  token-stats.ts          Usage tracking and cost accounting
  stats-dashboard.ts      HTTP dashboard (detection log, advisor, budget)
  live-config.ts          Hot-reload of guardclaw.json
  model-advisor.ts        Periodic model suggestion checks + auto DeBERTa updates
  budget-guard.ts         Daily/monthly spend tracking and cap enforcement
  routers/
    privacy.ts            Built-in S0–S3 privacy router
    token-saver.ts        Cost-aware model routing (optional)
    configurable.ts       User-defined custom routers
  injection/
    deberta.ts            DeBERTa classifier client (port 8404)
prompts/
  detection-system.md     Editable system prompt for LLM classification
  guard-agent-system.md   System prompt for the guard agent
  token-saver-judge.md    Prompt for cost-aware routing decisions
scripts/
  injection_classifier.py FastAPI DeBERTa service with hot-reload support
  install.sh              Guided installer (Node + Python + OS service setup)

Docker Secrets Integration

GuardClaw automatically detects and classifies values from Docker secret mounts as S2 (Sensitive).

How it works:

When GuardClaw sees a file path matching /run/secrets/* or /var/run/secrets/* (the standard Docker and Kubernetes secrets mount points), it:

  1. Classifies the path as S2 β€” sensitive credential data
  2. Marks the value for taint tracking
  3. Redacts any occurrence of that value in subsequent tool results before sending to the LLM

Docker Compose example:

services:
  app:
    image: myapp:latest
    secrets:
      - db_password
    environment:
      DB_PASSWORD_FILE: /run/secrets/db_password

secrets:
  db_password:
    file: ./secrets/db_password.txt

Your app reads from /run/secrets/db_password at runtime. If an OpenClaw agent later runs cat /run/secrets/db_password, GuardClaw:

  1. Detects the /run/secrets/ path β†’ S2
  2. Extracts the value from the tool result
  3. Registers it as tainted for the session
  4. Redacts the value from the LLM context as [REDACTED:TAINT]

Kubernetes example:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: secrets
      mountPath: /var/run/secrets
      readOnly: true
  volumes:
  - name: secrets
    secret:
      secretName: my-secrets

Kubernetes mounts secrets to /var/run/secrets/ β€” GuardClaw detects these automatically with the same S2 treatment.

Best practice:

  1. Store secrets in Docker/Kubernetes secret management β€” do NOT use env vars
  2. Mount via /run/secrets/ or /var/run/secrets/
  3. Apps read from the mount point at runtime
  4. GuardClaw handles the rest β€” automatic detection and taint tracking

This approach provides defense in depth:

  • Secrets stored securely (never in image layers or env var listings)
  • GuardClaw detects the path and marks values as sensitive
  • Taint tracking ensures the value is redacted everywhere in the session
  • Even if an agent accidentally runs cat /run/secrets/X, the value is suppressed before the LLM sees it

Troubleshooting

"Cannot find package 'tsx'" β€” Run npm run build first. Plugin runs from compiled JS.

"No original provider target found" (502) β€” Proxy can't find upstream provider. Ensure OpenClaw config has providers with baseUrl set.

"SyntaxError: Unexpected end of JSON input" β€” Rebuild and restart gateway.

Gateway crash loop β€” Set "enabled": false in ~/.openclaw/guardclaw.json under privacy, restart, check logs:

tail -f ~/.openclaw/logs/gateway.err.log | grep GuardClaw

DeBERTa service not responding β€” Check the log and restart the service:

tail -f ~/.openclaw/deberta.log

# macOS
launchctl kickstart -k gui/$(id -u)/ai.guardclaw.deberta

# Linux
systemctl --user restart guardclaw-deberta

Model Advisor not showing suggestions β€” Confirm modelAdvisor.enabled: true in ~/.openclaw/guardclaw.json. You can also trigger a manual check from the Advisor tab in the dashboard.


Uninstall

openclaw plugins uninstall guardclaw
rm -rf /opt/guardclaw
rm ~/.openclaw/guardclaw.json
openclaw gateway restart

Attribution

GuardClaw is built on EdgeClaw, the privacy extension developed by OpenBMB / Tsinghua University researchers, licensed under MIT. The core plugin architecture, sensitivity detection pipeline, dual-track memory system, and privacy proxy originate from EdgeClaw. Centrase AI maintains this standalone package and has extended it with additional security hardening, prompt injection detection (S0 tier), a stats dashboard, guard session registry, and DeBERTa-based injection classification.

The S0 prompt injection detection layer draws on the work of Protect AI β€” their LLM Guard library (MIT License) and the deberta-v3-base-prompt-injection-v2 model (Apache 2.0) set the standard for transformer-based injection detection and directly informed how GuardClaw's S0 layer works.


License

MIT β€” see LICENSE.

Built by Centrase AI Β· Gold Coast, Australia Β· Trusted since 2007, built for what's next.

Derived from EdgeClaw by OpenBMB / Tsinghua University β€” original MIT licence retained in NOTICE.

About

πŸ›‘οΈ Privacy-aware OpenClaw plugin. Classifies messages into S1/S2/S3 sensitivity tiers β€” keeps private data local, redacts PII before cloud. Built by Centrase AI.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors