Atomic Chat

Local AI app and inference engine for agents. Run open-weight LLMs locally — private, on your machine.

Getting Started · Hugging Face · Discord · X / Twitter · Bug Reports

📦 Download

Desktop

Mobile

🔌 Use It as an API

Atomic Chat runs an OpenAI-compatible server at http://localhost:1337/v1 — a drop-in replacement for the OpenAI SDK. Load a model in the app, then point any client at it:

curl http://localhost:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-id-loaded-in-atomic-chat>",
    "messages": [{ "role": "user", "content": "Say hello in one word" }]
  }'

from openai import OpenAI

# Atomic Chat is OpenAI API-compatible — only the base_url changes.
client = OpenAI(base_url="http://localhost:1337/v1", api_key="not-needed")

resp = client.chat.completions.create(
    model="<model-id-loaded-in-atomic-chat>",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)
print(resp.choices[0].message.content)

Bound to 127.0.0.1 by default; set host: 0.0.0.0 to expose it on your LAN. Works with any agent, CLI, or IDE plugin that speaks the OpenAI API — see Launch With below.

✨ Features

Local models

Run open-weight LLMs locally from HuggingFace — Llama, Gemma, Qwen, Mistral, Phi, and others
Multi-Token Prediction (MTP) speculative decoding — 30–70% throughput boost on supported models, up to 3× on Gemma 4
DFlash block-diffusion decoding — up to 6× faster on Qwen 3.6, Gemma 4, Kimi K2.5
Flash Attention toggle (on / off / auto)
Automatic reasoning-context tracking for chain-of-thought models
Auto context-window expansion with overflow notifications
EAGLE-3 speculative decoding for Gemma 4 on Apple Silicon (MLX)
MTP on MLX for Qwen 3.5 / 3.6 and DeepSeek V4
TurboQuant KV cache (turbo3 / turbo4) on llama.cpp — now on Windows & Linux too, not just macOS: up to ~4.3× smaller KV cache footprint, CPU and GPU (CUDA / Vulkan)
TurboQuant KV cache on MLX-VLM — smaller memory footprint via RHT-correct fast paths

Cloud models

Built-in providers: OpenAI, Anthropic, Mistral, Groq, MiniMax, Qwen, Moonshot
Bring your own key, switch model per chat, mix local and cloud freely

Tools & integrations

One-click agent launch — launch coding agents like Claude Code, Codex CLI, Cline, OpenCode, Droid, Goose, OpenHands, Copilot CLI, Kilo Code and Zed in one click from the Integrations tab
Artifacts — live preview panel for HTML/CSS/JS code with copy, download and print
Connect multiple MCP servers — bring your own tools, file access, web search
Custom assistants with per-assistant system prompts
Projects with conversation tree view in the sidebar

Local API

OpenAI-compatible server at http://localhost:1337/v1 — drop-in replacement for the OpenAI SDK
Works with any agent, CLI, or IDE plugin that speaks the OpenAI API
Bound to 127.0.0.1 by default; set host: 0.0.0.0 to expose on LAN

Privacy

Everything runs locally when you want it to — local server is loopback-only by default
Your conversations and keys stay on your machine

⚙️ Inference Engines

Three engines under the hood, all exposed through one OpenAI-compatible API at http://localhost:1337/v1:

atomic-llama-cpp-turboquant — our llama.cpp fork with TurboQuant KV-cache optimizations (turbo3 / turbo4) for faster, lower-memory quantized inference. Now a selectable second provider ("Atomic Llama.cpp Turboquant") on all three desktops — macOS, Windows, and Linux — CPU and GPU (CUDA / Vulkan).
Upstream llama.cpp — official ggml-org build, the default engine on Windows and Linux for the widest hardware coverage and MTP support.
MLX-VLM — Apple Silicon-native engine for vision-language models, running on the Neural Engine and unified memory. Faster than llama.cpp on M-series chips for supported models.

Speculative-decoding features available across backends:

MTP (Multi-Token Prediction) — a draft model predicts ahead, the full model verifies in one pass. Available on macOS and Windows.
DFlash — block-diffusion speculative decoding for Qwen 3.6, Gemma 4, Kimi K2.5 and others. Apple Silicon only; can't be enabled together with MTP.
Flash Attention — Settings → on / off / auto.

Tools talking to http://localhost:1337/v1 don't need to know which backend is running underneath — switch engines without reconfiguring clients.

🚀 Launch With

Atomic Chat runs an OpenAI-compatible server at http://localhost:1337/v1, so any agent, CLI, IDE plugin, or app that speaks the OpenAI API can run on top of your local models — no extra glue needed. Just point its base URL at Atomic Chat and you're done.

A few projects already ship first-class support with their own setup docs:

Tool	What it is	Setup
OpenCode	Open-source TUI coding agent. Add Atomic Chat as a local provider in `opencode.json`.	Setup guide →
Goose	Open-source extensible AI agent (CLI, desktop, API).	Setup guide →
nanobot	Ultra-lightweight personal AI agent with chat channels, MCP, and WebUI.	Repo →
nanoclaw	Containerized agent runtime that calls Atomic Chat as an MCP tool.	Skill guide →
OpenClaude	Open-source coding-agent CLI for cloud and local models. Lists Atomic Chat as a supported provider.	Providers list →
Kilo Code	Open-source AI coding agent for VS Code, JetBrains, and CLI. Ships with first-class Atomic Chat provider support and auto-discovery.	Setup guide →
Hermes Desktop	Native desktop companion for Hermes Agent. Includes an Atomic Chat local preset at `http://localhost:1337/v1`.	Repo →
Hermes Workspace	Local-first agent workspace built on Nous Research's Hermes. Uses Atomic Chat as its inference backend.	Repo →

Built something that runs on Atomic Chat? Open a PR and we'll add it here.

🛠️ Build from Source

Prerequisites

Node.js ≥ 20.0.0
Yarn ≥ 4.5.3
Make ≥ 3.81
Rust (for Tauri)
(Apple Silicon) MetalToolchain xcodebuild -downloadComponent MetalToolchain

Run with Make

git clone https://github.com/AtomicBot-ai/Atomic-Chat
cd Atomic-Chat
make dev

This handles everything: installs dependencies, builds core components, and launches the app.

Available make targets:

make dev — full development setup and launch
make build — production build
make test — run tests and linting
make clean — delete everything and start fresh

Manual Commands

yarn install
yarn build:tauri:plugin:api
yarn build:core
yarn build:extensions
yarn dev

💻 System Requirements

macOS: 13.6+ (8GB RAM for 3B models, 16GB for 7B, 32GB for 13B)
Windows: 10/11 x64 (same RAM recommendations as macOS)
Linux: x86_64, glibc ≥ 2.35 (Ubuntu 22.04+, Debian 12+, Fedora 40+, Arch, Mint, Pop!_OS — same RAM recommendations as macOS). Optional: a Vulkan loader (vulkan-1 package, or mesa-vulkan-drivers / proprietary NVIDIA driver) for GPU acceleration.
iOS: download from App Store
Android: download from Google Play

🐧 Running on Linux

Atomic Chat ships as a single self-contained .AppImage — no installer, no root:

chmod +x Atomic.Chat_*_amd64.AppImage
./Atomic.Chat_*_amd64.AppImage

If prompted about FUSE on first launch: sudo apt install fuse libfuse2 (Debian/Ubuntu) or sudo dnf install fuse fuse-libs (Fedora). GPU acceleration (Vulkan) is auto-detected on first launch; only GGUF models run on Linux.

🧯 Troubleshooting

If something isn't working:

Copy your error logs and system specs
Open an issue on GitHub
Or ask for help in our Discord

👥 Contributors

Atomic Chat is built by a small core team and 140+ contributors — including everyone who shaped the project from its earliest days. Pull requests welcome — see CONTRIBUTING.md for how to get started.

⭐ Star History

📄 License

Apache 2.0 — see LICENSE for details.

🙏 Acknowledgements

Built on the shoulders of giants:

🌱 Heritage

Atomic Chat began as a fork of Jan by Menlo Research — an excellent open-source local-AI app. We're grateful to the Jan team and its contributors for the foundation they built. Atomic Chat has since grown its own direction, engines, and roadmap, but we tip our hat to where it started. 🙏

Name		Name	Last commit message	Last commit date
Latest commit History 8,378 Commits
.cargo		.cargo
.devcontainer		.devcontainer
.github		.github
.husky		.husky
.probe		.probe
assets		assets
autoqa		autoqa
core		core
docs		docs
downloads		downloads
extensions		extensions
foundation-models-server		foundation-models-server
mlx-server		mlx-server
scripts		scripts
src-tauri		src-tauri
tests		tests
web-app		web-app
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.yarnrc.yml		.yarnrc.yml
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOP.md		DEVELOP.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.gif		demo.gif
package.json		package.json
vitest.config.ts		vitest.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Atomic Chat

📦 Download

🔌 Use It as an API

✨ Features

⚙️ Inference Engines

🚀 Launch With

🛠️ Build from Source

Prerequisites

Run with Make

Manual Commands

💻 System Requirements

🐧 Running on Linux

🧯 Troubleshooting

👥 Contributors

⭐ Star History

📄 License

🙏 Acknowledgements

🌱 Heritage

About

Uh oh!

Releases 35

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Atomic Chat

📦 Download

🔌 Use It as an API

✨ Features

⚙️ Inference Engines

🚀 Launch With

🛠️ Build from Source

Prerequisites

Run with Make

Manual Commands

💻 System Requirements

🐧 Running on Linux

🧯 Troubleshooting

👥 Contributors

⭐ Star History

📄 License

🙏 Acknowledgements

🌱 Heritage

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages