Local AI app and inference engine for agents. Run open-weight LLMs locally — private, on your machine.
Getting Started · Hugging Face · Discord · X / Twitter · Bug Reports
Desktop
Mobile
Atomic Chat runs an OpenAI-compatible server at http://localhost:1337/v1 — a drop-in replacement for the OpenAI SDK. Load a model in the app, then point any client at it:
curl http://localhost:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<model-id-loaded-in-atomic-chat>",
"messages": [{ "role": "user", "content": "Say hello in one word" }]
}'from openai import OpenAI
# Atomic Chat is OpenAI API-compatible — only the base_url changes.
client = OpenAI(base_url="http://localhost:1337/v1", api_key="not-needed")
resp = client.chat.completions.create(
model="<model-id-loaded-in-atomic-chat>",
messages=[{"role": "user", "content": "Say hello in one word"}],
)
print(resp.choices[0].message.content)Bound to 127.0.0.1 by default; set host: 0.0.0.0 to expose it on your LAN. Works with any agent, CLI, or IDE plugin that speaks the OpenAI API — see Launch With below.
Local models
- Run open-weight LLMs locally from HuggingFace — Llama, Gemma, Qwen, Mistral, Phi, and others
- Multi-Token Prediction (MTP) speculative decoding — 30–70% throughput boost on supported models, up to 3× on Gemma 4
- DFlash block-diffusion decoding — up to 6× faster on Qwen 3.6, Gemma 4, Kimi K2.5
- Flash Attention toggle (
on/off/auto) - Automatic reasoning-context tracking for chain-of-thought models
- Auto context-window expansion with overflow notifications
- EAGLE-3 speculative decoding for Gemma 4 on Apple Silicon (MLX)
- MTP on MLX for Qwen 3.5 / 3.6 and DeepSeek V4
- TurboQuant KV cache (
turbo3/turbo4) on llama.cpp — now on Windows & Linux too, not just macOS: up to ~4.3× smaller KV cache footprint, CPU and GPU (CUDA / Vulkan) - TurboQuant KV cache on MLX-VLM — smaller memory footprint via RHT-correct fast paths
Cloud models
- Built-in providers: OpenAI, Anthropic, Mistral, Groq, MiniMax, Qwen, Moonshot
- Bring your own key, switch model per chat, mix local and cloud freely
Tools & integrations
- One-click agent launch — launch coding agents like Claude Code, Codex CLI, Cline, OpenCode, Droid, Goose, OpenHands, Copilot CLI, Kilo Code and Zed in one click from the Integrations tab
- Artifacts — live preview panel for HTML/CSS/JS code with copy, download and print
- Connect multiple MCP servers — bring your own tools, file access, web search
- Custom assistants with per-assistant system prompts
- Projects with conversation tree view in the sidebar
Local API
- OpenAI-compatible server at
http://localhost:1337/v1— drop-in replacement for the OpenAI SDK - Works with any agent, CLI, or IDE plugin that speaks the OpenAI API
- Bound to
127.0.0.1by default; sethost: 0.0.0.0to expose on LAN
Privacy
- Everything runs locally when you want it to — local server is loopback-only by default
- Your conversations and keys stay on your machine
Three engines under the hood, all exposed through one OpenAI-compatible API at http://localhost:1337/v1:
- atomic-llama-cpp-turboquant — our
llama.cppfork with TurboQuant KV-cache optimizations (turbo3/turbo4) for faster, lower-memory quantized inference. Now a selectable second provider ("Atomic Llama.cpp Turboquant") on all three desktops — macOS, Windows, and Linux — CPU and GPU (CUDA / Vulkan). - Upstream llama.cpp — official
ggml-orgbuild, the default engine on Windows and Linux for the widest hardware coverage and MTP support. - MLX-VLM — Apple Silicon-native engine for vision-language models, running on the Neural Engine and unified memory. Faster than llama.cpp on M-series chips for supported models.
Speculative-decoding features available across backends:
- MTP (Multi-Token Prediction) — a draft model predicts ahead, the full model verifies in one pass. Available on macOS and Windows.
- DFlash — block-diffusion speculative decoding for Qwen 3.6, Gemma 4, Kimi K2.5 and others. Apple Silicon only; can't be enabled together with MTP.
- Flash Attention — Settings →
on/off/auto.
Tools talking to http://localhost:1337/v1 don't need to know which backend is running underneath — switch engines without reconfiguring clients.
Atomic Chat runs an OpenAI-compatible server at http://localhost:1337/v1, so any agent, CLI, IDE plugin, or app that speaks the OpenAI API can run on top of your local models — no extra glue needed. Just point its base URL at Atomic Chat and you're done.
A few projects already ship first-class support with their own setup docs:
| Tool | What it is | Setup |
|---|---|---|
| OpenCode | Open-source TUI coding agent. Add Atomic Chat as a local provider in opencode.json. |
Setup guide → |
| Goose | Open-source extensible AI agent (CLI, desktop, API). | Setup guide → |
| nanobot | Ultra-lightweight personal AI agent with chat channels, MCP, and WebUI. | Repo → |
| nanoclaw | Containerized agent runtime that calls Atomic Chat as an MCP tool. | Skill guide → |
| OpenClaude | Open-source coding-agent CLI for cloud and local models. Lists Atomic Chat as a supported provider. | Providers list → |
| Kilo Code | Open-source AI coding agent for VS Code, JetBrains, and CLI. Ships with first-class Atomic Chat provider support and auto-discovery. | Setup guide → |
| Hermes Desktop | Native desktop companion for Hermes Agent. Includes an Atomic Chat local preset at http://localhost:1337/v1. |
Repo → |
| Hermes Workspace | Local-first agent workspace built on Nous Research's Hermes. Uses Atomic Chat as its inference backend. | Repo → |
Built something that runs on Atomic Chat? Open a PR and we'll add it here.
- Node.js ≥ 20.0.0
- Yarn ≥ 4.5.3
- Make ≥ 3.81
- Rust (for Tauri)
- (Apple Silicon) MetalToolchain
xcodebuild -downloadComponent MetalToolchain
git clone https://github.com/AtomicBot-ai/Atomic-Chat
cd Atomic-Chat
make devThis handles everything: installs dependencies, builds core components, and launches the app.
Available make targets:
make dev— full development setup and launchmake build— production buildmake test— run tests and lintingmake clean— delete everything and start fresh
yarn install
yarn build:tauri:plugin:api
yarn build:core
yarn build:extensions
yarn dev- macOS: 13.6+ (8GB RAM for 3B models, 16GB for 7B, 32GB for 13B)
- Windows: 10/11 x64 (same RAM recommendations as macOS)
- Linux: x86_64, glibc ≥ 2.35 (Ubuntu 22.04+, Debian 12+, Fedora 40+, Arch, Mint, Pop!_OS — same RAM recommendations as macOS). Optional: a Vulkan loader (
vulkan-1package, ormesa-vulkan-drivers/ proprietary NVIDIA driver) for GPU acceleration. - iOS: download from App Store
- Android: download from Google Play
Atomic Chat ships as a single self-contained .AppImage — no installer, no root:
chmod +x Atomic.Chat_*_amd64.AppImage
./Atomic.Chat_*_amd64.AppImageIf prompted about FUSE on first launch: sudo apt install fuse libfuse2 (Debian/Ubuntu) or sudo dnf install fuse fuse-libs (Fedora). GPU acceleration (Vulkan) is auto-detected on first launch; only GGUF models run on Linux.
If something isn't working:
Atomic Chat is built by a small core team and 140+ contributors — including everyone who shaped the project from its earliest days. Pull requests welcome — see CONTRIBUTING.md for how to get started.
Apache 2.0 — see LICENSE for details.
Built on the shoulders of giants:
Atomic Chat began as a fork of Jan by Menlo Research — an excellent open-source local-AI app. We're grateful to the Jan team and its contributors for the foundation they built. Atomic Chat has since grown its own direction, engines, and roadmap, but we tip our hat to where it started. 🙏
© 2026 Atomic Chat · Built with ❤️ · atomic.chat

