Skip to content

phuryn/grok-build-vscode

Repository files navigation

Grok Build for VS Code

License: MIT VS Code Grok Build The Product Compass

A thin VS Code sidebar client for xAI's Grok Build CLI. It spawns grok agent stdio as a headless child and drives it over the Agent Client Protocol (ACP) — session state, MCP servers, memory, and tool execution all stay inside that CLI process. Not a terminal launcher and not a re-implementation. Install the grok CLI first; the extension is a UI shell over it.

Works with a SuperGrok subscription or an xAI API key. Not affiliated with xAI.

Install free from the VS Code Marketplace →

Grok Build in the VS Code sidebar

Generated image rendered inline from /imagine

More screenshots →


Why an extension, not the CLI?

You get the things a terminal can't give you: VS Code's native diff editor on a proposed edit before you approve it, permission cards with Allow always / once / Reject instead of [y/N] prompts, your active editor and selection as first-class @file context, session history you can resume/rename/delete, inline images and video from /imagine, voice dictation, and side-by-side placement next to other AI tools. It's a UI shell — the trade-off is that it's useless without the grok CLI installed.

A short tour of how the extension is wired (and the one place it's deliberately not thin — Plan Mode) lives in docs/architecture.md.


Requirements

  • VS Code 1.90+ (or a compatible editor — Cursor, Windsurf, VSCodium).
  • The Grok Build CLI (grok) on macOS, Linux, or Windows. The CLI ships a native Windows build, so the extension runs natively on all three — no WSL required (WSL2 + Remote-WSL still works if you prefer it).
  • A login: either a SuperGrok subscription (grok /login) or an xAI API key. With a subscription you get Grok Build; with an API key you also get the grok-4.x models and grok-imagine.
  • For voice input only (optional): ffmpeg on PATH, and a separate xAI API key for Speech-to-Text (pay-as-you-go, ~$0.10/hr — your CLI login does not cover it). See Voice input under Features & capabilities.

Install

1. Install the CLI and sign in.

macOS / Linux / WSL:

curl -fsSL https://x.ai/cli/install.sh | bash
grok /login

Windows (PowerShell):

irm https://x.ai/cli/install.ps1 | iex
grok /login

grok /login opens a browser and completes OAuth in one step. Prefer an API key? Get one at console.x.ai and set XAI_API_KEY in your shell or a workspace .env (the extension auto-loads it).

2. Install the extension.

From the Marketplace — search Grok Build by PawelHuryn, or:

code --install-extension PawelHuryn.grok-vscode-phuryn

Or build from source:

git clone https://github.com/phuryn/grok-build-vscode.git
cd grok-build-vscode
npm install
./scripts/install.sh        # Windows: pwsh scripts\install.ps1

Reload VS Code (Ctrl+Shift+P → Developer: Reload Window) and click the Grok icon in the activity bar.

Tip: Right-click the Grok icon → Move To → Secondary Side Bar to park Grok on the right, next to other AI tools.

Right-click the Grok icon → Move To → Secondary Side Bar

Uninstall: ./scripts/uninstall.sh (Windows: pwsh scripts\uninstall.ps1) or code --uninstall-extension PawelHuryn.grok-vscode-phuryn.


Quick start

  1. Open the Grok sidebar (activity bar icon, or Ctrl/Cmd+;).
  2. Type a prompt and press Enter. Grok streams its answer; a Thinking… line resolves to Thought for Ns — click it to expand the reasoning.
  3. Approve actions. When Grok wants to write a file or run a command it may raise a permission card — preview an edit in the native diff editor, then Allow once / always / Reject.
  4. Pick your mode (Agent / Plan / YOLO), model, and reasoning effort from the bottom toolbar and gear menu.
  5. Resume anytime — the clock icon lists past sessions for this project.

Features & capabilities

Click any feature to expand.

Permission cards with diff preview — see every edit in VS Code's native diff before you approve

For kind:"edit" tool calls the card shows a path — N → M lines summary and an open diff → button that opens VS Code's native diff editor against the proposed content. Approve with Allow once / always, or Reject. The actual write only happens after you approve, via fs/write_text_file — no surprise changes to your files.

Modes — Agent, Plan & YOLO
Mode Behaviour
Agent (default) Grok acts directly and may ask permission for a write or shell action it judges sensitive — a card appears in chat.
Plan Grok drafts a plan first and cannot write to the workspace or run anything outside a read-only allowlist until you approve. Approve / Reject / Cancel from the card, each with an optional comment. Plan Mode is enforced by the extension — see How it works.
YOLO The extension auto-approves every permission request. The CLI session is untouched — no restart, just a flag flip.
Image & video generation/imagine renders right in the chat

Type /imagine <prompt> (or /imagine-video <prompt>) and the result renders inline — images as a compact thumbnail (capped at 320px; click to open the source file), videos with native playback controls. Hover either for Copy path / Open in VS Code icons. Both are subscription-only Grok features, both survive a session resume, and the file is streamed from disk so even a multi-MB video plays. (Editing a reference photo with /imagine works too, via Grok's image_edit tool.) Wire-format details: research/image-generation.md.

Voice input — hands-free dictation with live transcription

The microphone button in the composer dictates speech, transcribed by xAI's Speech-to-Text API. Click it, wait for the blue listening waves, and speak — words appear live as you talk. Say "grok send" to submit hands-free and keep listening for the next message (dictate while Grok responds; those messages queue and flush when it finishes). Click the mic to stop and keep any in-progress text.

The two-word send phrase is deliberate (it won't fire on a message that merely ends in "send") and is configurable via grok.voiceSendPhrase. Streaming is the default; set grok.voiceStreaming: false for one-shot batch mode.

Cost: Speech-to-Text is a separate, pay-as-you-go xAI product — $0.10/hr batch, $0.20/hr streaming, billed by audio duration. In practice ~500 words ≈ ½–1¢; a heavy 10,000-word day ≈ 10¢. It needs its own console.x.ai key (grok.voiceApiKey / GROK_VOICE_API_KEY / XAI_API_KEY) — a SuperGrok subscription grants no API credit. Why it bypasses the CLI, and how the cost was measured end-to-end: research/voice-input.md.

File chips — your editor and selection as @file context

The active editor is added as an implicit chip automatically (toggle with grok.includeActiveFileByDefault). Drag from the Explorer, right-click → Grok: Send File, press Alt+G, or use the + toolbar button to add explicit chips. Chips are sent as @/path/to/file references — the CLI resolves them, so content stays current and doesn't bloat chat history. Hold Shift while dragging to embed the file's contents inline as a fenced code block instead.

Session history — resume, rename, or delete any past session

The clock icon lists every session the CLI saved for this project (~/.grok/sessions/<urlencoded-cwd>/). Click a row to resume — the extension calls session/load and Grok replays the conversation, with inline images, plans, and reasoning intact. Hover to rename (pencil) or delete (trash); names default to the first message. Renames live in VS Code's globalState and never touch Grok's own files.

Tool calls — every read, edit & command, inline

Every action Grok takes appears in chat — a single flat row ("Read sidebar.ts lines 1–120", "Edit package.json", "Run npm test"), or a collapsed group ("Read, Edit +2") that expands on click.

Model picker — switch models live, no restart

Click the model name in the gear popover. The list comes from the CLI's session/new response; switching is live (session/set_model) with no restart when the target model belongs to the same agent.

Reasoning effort — trade tokens for depth

Gear icon → effort dots pick a level (nonexhigh), forwarded to the CLI as --reasoning-effort. Changing it restarts the session, with an optional Summarize & Restart to carry context forward. (Some subscription tiers may reject effort at the backend.)

Cost control — token donut, /compact & effort

Stay on top of spend without leaving the sidebar: the bottom-toolbar context donut shows usedK/maxK tokens after each prompt; /compact (gear → Compact) compresses the conversation when it fills, or + starts fresh. Reasoning effort trades tokens for depth, and voice STT cost is called out above.

MCP servers — whatever the CLI loads

MCP servers are configured in the CLI (~/.grok/config.toml global, .grok/config.toml project) — the extension picks up whatever the CLI loads:

grok mcp add playwright --command npx --args @playwright/mcp@latest

Or edit the config via gear → Open global / project config, then click + to reload.


Configuration

All grok.* settings (VS Code Settings → search "grok")
Setting Default Notes
grok.cliPath "" Path to the grok binary. Empty = auto-discover (~/.grok/bin/grok → PATH).
grok.defaultModel "" Model ID for new sessions. Empty = CLI default.
grok.defaultEffort "" Reasoning effort forwarded as --reasoning-effort (none / minimal / low / medium / high / xhigh). Empty = CLI default. Changing it restarts the session.
grok.includeActiveFileByDefault true Auto-add the active editor as a context chip.
grok.useCtrlEnterToSend false When true, Enter inserts a newline and Ctrl/Cmd+Enter sends.
grok.voiceApiKey "" xAI API key for voice Speech-to-Text — a separate console.x.ai developer key, not the CLI login. Empty = fall back to GROK_VOICE_API_KEY / XAI_API_KEY in the workspace .env.
grok.ffmpegPath "" Path to ffmpeg for microphone recording. Empty = use ffmpeg from PATH.
grok.voiceInputDevice "" Microphone device override. Empty = system default (Windows auto-detects the first DirectShow audio device).
grok.voiceSendPhrase "grok send" Spoken phrase that auto-submits when it ends a transcription. Empty = disable hands-free sending.
grok.voiceStreaming true Stream transcription live as you speak. false = one-shot batch mode. Streaming costs $0.20/hr vs $0.10/hr batch.

Commands & keybindings

VS Code commands & keys (Ctrl/Cmd+Shift+P → "Grok")

VS Code commands (not Grok slash commands):

Command What it does
Grok: Open Open the Grok sidebar
Grok: New Session Start a fresh session
Grok: Pick Model Open the model picker
Grok: Toggle Plan / Agent Mode Open the mode picker (Agent / Plan / YOLO)
Grok: Send File Add the selected file to context
Grok: Send Selection Send the current text selection to Grok
Grok: Insert @-Mention Insert an @-mention for the active file into the composer
Grok: Show Logs Open the Grok output channel (ACP messages, errors)
Grok: Log Out Sign out of the Grok CLI (grok logout) and return to the sign-in screen
Key Action
Ctrl+; / Cmd+; Open Grok sidebar
Alt+G Insert @-mention for the active file (when the editor is focused)

Grok's own slash commands (/imagine, /compact, …) autocomplete in the composer when you type /, sourced live from your installed CLI version. Reference snapshot: docs/SLASH-COMMANDS.md.


How it works

The extension is intentionally thin: it speaks JSON-RPC over grok agent stdio and renders the results. Grok owns sessions, memory, MCP, models, and tool execution; the extension mediates file reads/writes, terminal requests, diff previews, the webview UI — and Plan Mode.

Plan Mode is the one place the extension is not thin. The CLI's exit_plan_mode is unreliable (it reports "approved" to any reply), so the extension enforces planning itself: a gate blocks workspace writes and non-read-only commands until you approve, and a hidden primer message teaches Grok to read your real verdict ([Plan approved] / [Plan rejected] / [Plan cancelled]) from your next message.

Full diagram, message flow, module map, and design notes: docs/architecture.md.


Development

Build, test & repo conventions
npm install
npm test         # grok-free unit/DOM/integration suite — exactly what CI runs
npm run package  # → grok-vscode-phuryn-<version>.vsix

npm test is grok-free, so local ≡ CI — it never spawns the real binary. A separate, on-demand npm run test:live drives the actual grok end-to-end (handshake, restore, plan-mode, image/video gen) and is run before a release, not on every commit. Full test taxonomy and what's deferred to a future @vscode/test-electron suite: TESTS.md. Architecture and module map: docs/architecture.md.

Repo conventions: direct-to-main, no feature branches; commits explain the why; no speculative abstractions; the grok-free suite is the floor — every change keeps it green.


Known limits

  • Diff preview semantics. The diff editor compares the proposed old vs. new text against each other, not against the file on disk at preview time. The write happens via fs/write_text_file after approval. This is an ACP constraint — tool_call_update carries the diff before the file is touched.
  • No worktree UI. Grok: New Worktree Session is planned but not yet implemented.
  • View placement. The view defaults to the left activity bar; drag it to the secondary side bar manually if you want it on the right.

License

MIT

About

Grok Build Visual Studio Code extension. A full embedded chat UI — not a terminal launcher.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors