A thin VS Code sidebar client for xAI's Grok Build CLI. It spawns grok agent stdio as a headless child and drives it over the Agent Client Protocol (ACP) — session state, MCP servers, memory, and tool execution all stay inside that CLI process. Not a terminal launcher and not a re-implementation. Install the grok CLI first; the extension is a UI shell over it.
Works with a SuperGrok subscription or an xAI API key. Not affiliated with xAI.
Install free from the VS Code Marketplace →
You get the things a terminal can't give you: VS Code's native diff editor on a proposed edit before you approve it, permission cards with Allow always / once / Reject instead of [y/N] prompts, your active editor and selection as first-class @file context, session history you can resume/rename/delete, inline images and video from /imagine, voice dictation, and side-by-side placement next to other AI tools. It's a UI shell — the trade-off is that it's useless without the grok CLI installed.
A short tour of how the extension is wired (and the one place it's deliberately not thin — Plan Mode) lives in docs/architecture.md.
- VS Code 1.90+ (or a compatible editor — Cursor, Windsurf, VSCodium).
- The Grok Build CLI (
grok) on macOS, Linux, or Windows. The CLI ships a native Windows build, so the extension runs natively on all three — no WSL required (WSL2 + Remote-WSL still works if you prefer it). - A login: either a SuperGrok subscription (
grok /login) or an xAI API key. With a subscription you get Grok Build; with an API key you also get the grok-4.x models and grok-imagine. - For voice input only (optional):
ffmpegonPATH, and a separate xAI API key for Speech-to-Text (pay-as-you-go, ~$0.10/hr — your CLI login does not cover it). See Voice input under Features & capabilities.
1. Install the CLI and sign in.
macOS / Linux / WSL:
curl -fsSL https://x.ai/cli/install.sh | bash
grok /loginWindows (PowerShell):
irm https://x.ai/cli/install.ps1 | iex
grok /logingrok /login opens a browser and completes OAuth in one step. Prefer an API key? Get one at console.x.ai and set XAI_API_KEY in your shell or a workspace .env (the extension auto-loads it).
2. Install the extension.
From the Marketplace — search Grok Build by PawelHuryn, or:
code --install-extension PawelHuryn.grok-vscode-phurynOr build from source:
git clone https://github.com/phuryn/grok-build-vscode.git
cd grok-build-vscode
npm install
./scripts/install.sh # Windows: pwsh scripts\install.ps1Reload VS Code (Ctrl+Shift+P → Developer: Reload Window) and click the Grok icon in the activity bar.
Tip: Right-click the Grok icon → Move To → Secondary Side Bar to park Grok on the right, next to other AI tools.
Uninstall: ./scripts/uninstall.sh (Windows: pwsh scripts\uninstall.ps1) or code --uninstall-extension PawelHuryn.grok-vscode-phuryn.
- Open the Grok sidebar (activity bar icon, or
Ctrl/Cmd+;). - Type a prompt and press Enter. Grok streams its answer; a Thinking… line resolves to Thought for Ns — click it to expand the reasoning.
- Approve actions. When Grok wants to write a file or run a command it may raise a permission card — preview an edit in the native diff editor, then Allow once / always / Reject.
- Pick your mode (Agent / Plan / YOLO), model, and reasoning effort from the bottom toolbar and gear menu.
- Resume anytime — the clock icon lists past sessions for this project.
Click any feature to expand.
Permission cards with diff preview — see every edit in VS Code's native diff before you approve
For kind:"edit" tool calls the card shows a path — N → M lines summary and an open diff → button that opens VS Code's native diff editor against the proposed content. Approve with Allow once / always, or Reject. The actual write only happens after you approve, via fs/write_text_file — no surprise changes to your files.
Modes — Agent, Plan & YOLO
| Mode | Behaviour |
|---|---|
| Agent (default) | Grok acts directly and may ask permission for a write or shell action it judges sensitive — a card appears in chat. |
| Plan | Grok drafts a plan first and cannot write to the workspace or run anything outside a read-only allowlist until you approve. Approve / Reject / Cancel from the card, each with an optional comment. Plan Mode is enforced by the extension — see How it works. |
| YOLO | The extension auto-approves every permission request. The CLI session is untouched — no restart, just a flag flip. |
Image & video generation — /imagine renders right in the chat
Type /imagine <prompt> (or /imagine-video <prompt>) and the result renders inline — images as a compact thumbnail (capped at 320px; click to open the source file), videos with native playback controls. Hover either for Copy path / Open in VS Code icons. Both are subscription-only Grok features, both survive a session resume, and the file is streamed from disk so even a multi-MB video plays. (Editing a reference photo with /imagine works too, via Grok's image_edit tool.) Wire-format details: research/image-generation.md.
Voice input — hands-free dictation with live transcription
The microphone button in the composer dictates speech, transcribed by xAI's Speech-to-Text API. Click it, wait for the blue listening waves, and speak — words appear live as you talk. Say "grok send" to submit hands-free and keep listening for the next message (dictate while Grok responds; those messages queue and flush when it finishes). Click the mic to stop and keep any in-progress text.
The two-word send phrase is deliberate (it won't fire on a message that merely ends in "send") and is configurable via grok.voiceSendPhrase. Streaming is the default; set grok.voiceStreaming: false for one-shot batch mode.
Cost: Speech-to-Text is a separate, pay-as-you-go xAI product — $0.10/hr batch, $0.20/hr streaming, billed by audio duration. In practice ~500 words ≈ ½–1¢; a heavy 10,000-word day ≈ 10¢. It needs its own console.x.ai key (
grok.voiceApiKey/GROK_VOICE_API_KEY/XAI_API_KEY) — a SuperGrok subscription grants no API credit. Why it bypasses the CLI, and how the cost was measured end-to-end: research/voice-input.md.
File chips — your editor and selection as @file context
The active editor is added as an implicit chip automatically (toggle with grok.includeActiveFileByDefault). Drag from the Explorer, right-click → Grok: Send File, press Alt+G, or use the + toolbar button to add explicit chips. Chips are sent as @/path/to/file references — the CLI resolves them, so content stays current and doesn't bloat chat history. Hold Shift while dragging to embed the file's contents inline as a fenced code block instead.
Session history — resume, rename, or delete any past session
The clock icon lists every session the CLI saved for this project (~/.grok/sessions/<urlencoded-cwd>/). Click a row to resume — the extension calls session/load and Grok replays the conversation, with inline images, plans, and reasoning intact. Hover to rename (pencil) or delete (trash); names default to the first message. Renames live in VS Code's globalState and never touch Grok's own files.
Tool calls — every read, edit & command, inline
Every action Grok takes appears in chat — a single flat row ("Read sidebar.ts lines 1–120", "Edit package.json", "Run npm test"), or a collapsed group ("Read, Edit +2") that expands on click.
Model picker — switch models live, no restart
Click the model name in the gear popover. The list comes from the CLI's session/new response; switching is live (session/set_model) with no restart when the target model belongs to the same agent.
Reasoning effort — trade tokens for depth
Gear icon → effort dots pick a level (none → xhigh), forwarded to the CLI as --reasoning-effort. Changing it restarts the session, with an optional Summarize & Restart to carry context forward. (Some subscription tiers may reject effort at the backend.)
Cost control — token donut, /compact & effort
Stay on top of spend without leaving the sidebar: the bottom-toolbar context donut shows usedK/maxK tokens after each prompt; /compact (gear → Compact) compresses the conversation when it fills, or + starts fresh. Reasoning effort trades tokens for depth, and voice STT cost is called out above.
MCP servers — whatever the CLI loads
MCP servers are configured in the CLI (~/.grok/config.toml global, .grok/config.toml project) — the extension picks up whatever the CLI loads:
grok mcp add playwright --command npx --args @playwright/mcp@latestOr edit the config via gear → Open global / project config, then click + to reload.
All grok.* settings (VS Code Settings → search "grok")
| Setting | Default | Notes |
|---|---|---|
grok.cliPath |
"" |
Path to the grok binary. Empty = auto-discover (~/.grok/bin/grok → PATH). |
grok.defaultModel |
"" |
Model ID for new sessions. Empty = CLI default. |
grok.defaultEffort |
"" |
Reasoning effort forwarded as --reasoning-effort (none / minimal / low / medium / high / xhigh). Empty = CLI default. Changing it restarts the session. |
grok.includeActiveFileByDefault |
true |
Auto-add the active editor as a context chip. |
grok.useCtrlEnterToSend |
false |
When true, Enter inserts a newline and Ctrl/Cmd+Enter sends. |
grok.voiceApiKey |
"" |
xAI API key for voice Speech-to-Text — a separate console.x.ai developer key, not the CLI login. Empty = fall back to GROK_VOICE_API_KEY / XAI_API_KEY in the workspace .env. |
grok.ffmpegPath |
"" |
Path to ffmpeg for microphone recording. Empty = use ffmpeg from PATH. |
grok.voiceInputDevice |
"" |
Microphone device override. Empty = system default (Windows auto-detects the first DirectShow audio device). |
grok.voiceSendPhrase |
"grok send" |
Spoken phrase that auto-submits when it ends a transcription. Empty = disable hands-free sending. |
grok.voiceStreaming |
true |
Stream transcription live as you speak. false = one-shot batch mode. Streaming costs $0.20/hr vs $0.10/hr batch. |
VS Code commands & keys (Ctrl/Cmd+Shift+P → "Grok")
VS Code commands (not Grok slash commands):
| Command | What it does |
|---|---|
Grok: Open |
Open the Grok sidebar |
Grok: New Session |
Start a fresh session |
Grok: Pick Model |
Open the model picker |
Grok: Toggle Plan / Agent Mode |
Open the mode picker (Agent / Plan / YOLO) |
Grok: Send File |
Add the selected file to context |
Grok: Send Selection |
Send the current text selection to Grok |
Grok: Insert @-Mention |
Insert an @-mention for the active file into the composer |
Grok: Show Logs |
Open the Grok output channel (ACP messages, errors) |
Grok: Log Out |
Sign out of the Grok CLI (grok logout) and return to the sign-in screen |
| Key | Action |
|---|---|
Ctrl+; / Cmd+; |
Open Grok sidebar |
Alt+G |
Insert @-mention for the active file (when the editor is focused) |
Grok's own slash commands (/imagine, /compact, …) autocomplete in the composer when you type /, sourced live from your installed CLI version. Reference snapshot: docs/SLASH-COMMANDS.md.
The extension is intentionally thin: it speaks JSON-RPC over grok agent stdio and renders the results. Grok owns sessions, memory, MCP, models, and tool execution; the extension mediates file reads/writes, terminal requests, diff previews, the webview UI — and Plan Mode.
Plan Mode is the one place the extension is not thin. The CLI's exit_plan_mode is unreliable (it reports "approved" to any reply), so the extension enforces planning itself: a gate blocks workspace writes and non-read-only commands until you approve, and a hidden primer message teaches Grok to read your real verdict ([Plan approved] / [Plan rejected] / [Plan cancelled]) from your next message.
Full diagram, message flow, module map, and design notes: docs/architecture.md.
Build, test & repo conventions
npm install
npm test # grok-free unit/DOM/integration suite — exactly what CI runs
npm run package # → grok-vscode-phuryn-<version>.vsixnpm test is grok-free, so local ≡ CI — it never spawns the real binary. A separate, on-demand npm run test:live drives the actual grok end-to-end (handshake, restore, plan-mode, image/video gen) and is run before a release, not on every commit. Full test taxonomy and what's deferred to a future @vscode/test-electron suite: TESTS.md. Architecture and module map: docs/architecture.md.
Repo conventions: direct-to-main, no feature branches; commits explain the why; no speculative abstractions; the grok-free suite is the floor — every change keeps it green.
- Diff preview semantics. The diff editor compares the proposed old vs. new text against each other, not against the file on disk at preview time. The write happens via
fs/write_text_fileafter approval. This is an ACP constraint —tool_call_updatecarries the diff before the file is touched. - No worktree UI.
Grok: New Worktree Sessionis planned but not yet implemented. - View placement. The view defaults to the left activity bar; drag it to the secondary side bar manually if you want it on the right.
MIT


