Offline RAG assistant for studying with text-based PDF textbooks. Powered with LFM2-2.6B on board. LFM2 is Liquid AI's hybrid architecture — efficient enough to run on RAM-only machines without a GPU.
This project is my first test of LFM2 in a real RAG pipeline, with plans to explore its limits on low-power local servers. More LFM2 experiments coming.

Upload a book, ask questions, get answers with page references. Everything runs locally via llama.cpp. Optionally switch to OpenAI or Gemini via API key.
This project is officially featured in the Liquid4All Cookbook as a comprehensive example of a complex RAG application built with Liquid AI models.
- Conversational Memory: Implements a Two-Tier Summary Buffer. It keeps the last 12 messages "hot" (word-for-word) and automatically compresses older history into a lightweight summary block (extracting discussed topics and pages). This prevents context overflow while maintaining long-term session memory.
- Smart Chunking: PyMuPDF loader with 1800-char chunks and 500-char overlap. Optimized to capture Word Banks and exercise instructions together.
- Retrieve & Rank: Two-stage RAG pipeline. Initial Hybrid Search followed by a Cross-Encoder Reranker (MiniLM-L6) to ensure the highest semantic accuracy.
- Exercise Generator: Automatically extracts grammar exercises from textbook pages. No hallucinations — exercises are taken directly from the book with all provided options/word banks.
- Active Task Context: The assistant remembers the current exercise and can explain rules and check answers in context.
flowchart LR
subgraph Ingestion["Ingestion (one-time per book)"]
PDF["PDF"] --> Loader["PyMuPDFLoader"]
Loader --> Split["Semantic Chunking\n1000 / 150"]
Split --> Embed["MiniLM-L6-v2\n(ONNX, local)"]
Split --> Keyword["BM25 Index"]
Embed --> Store["Chroma DB"]
end
subgraph Query["Query"]
Q["Question"] --> Search["Hybrid Search\n(EnsembleRetriever)"]
Store --> Search
Keyword --> Search
Search --> Rerank["Cross-Encoder\nReranker (Top-5)"]
Rerank --> Prompt["Prompt +\nTop-5 contexts"]
Prompt --> LLM["LLM"]
LLM --> Answer["Answer +\npage refs"]
end
subgraph LLM["LLM (choose one)"]
L1["LFM2-2.6B\n(local, llama-server)"]
L2["OpenAI API"]
L3["Gemini API"]
end
- Python 3.10+
- CUDA 12.4 recommended for GPU inference; CPU works without it
setup.batCreates venv, installs dependencies, and downloads llama.cpp binaries.
During setup you'll be prompted to choose:
1— CUDA 12.4 (GPU, recommended for NVIDIA)2— CPU only (no GPU required)
Download the GGUF and place it in the models/ folder:
| Model | Description |
|---|---|
| LFM2-2.6B-GGUF | Liquid AI's hybrid model, optimized for edge/on-device inference |
run.batauto-detects the first.gguffile inmodels/.
run.bat- If a
.ggufmodel is inmodels/→ starts llama-server + Gradio app - If no model → starts in cloud-only mode (use API key in the UI)
Opens at http://127.0.0.1:7860
- Select a book from the dropdown (PDFs in
books/are auto-detected) - Choose LLM provider:
- Local — uses llama-server (must have a model in
models/) - OpenAI — paste your API key, uses
gpt-4o-miniby default - Gemini — paste your API key, uses
gemini-2.5-flash-liteby default
- Local — uses llama-server (must have a model in
- Ask away:
- Question mode: Just ask any question about the book content.
- Homework Check: Include trigger words like
homework,check, orevaluatein your message to get grammar feedback with rule citations. - Exercise mode: Say "give me a task" or "give me an exercise" to get a real exercise extracted from the book with blanks to fill in.
- Follow-up: After getting an exercise, you can ask for hints, explanations, or corrections.
Use the Upload PDF button in the sidebar or drop PDFs into books/.
├── app.py # Gradio web interface
├── rag_engine.py # RAG pipeline (ingest, hybrid search, query)
├── prompt_builder.py # LangChain prompts (question, homework, task)
├── task_generator.py # Regex parser for exercise extraction
├── llm_client.py # LLM provider connections (local/OpenAI/Gemini)
├── requirements.txt # Python dependencies
├── setup.bat # One-time setup
├── run.bat # Launch script
├── books/ # PDF textbooks (gitignored)
├── models/ # .gguf model files (gitignored)
├── bin/ # llama.cpp binaries (gitignored)
└── chroma_db/ # Vector store (gitignored, auto-created)
| Component | Technology |
|---|---|
| Framework | LangChain |
| Vector DB | Chroma |
| Embeddings | all-MiniLM-L6-v2 (ONNX, local) |
| Local LLM | LFM2-2.6B via llama.cpp |
| Cloud LLM | OpenAI / Gemini (optional) |
| Reranker | ms-marco-MiniLM-L-6-v2 (local) |
| UI | Gradio |
| PDF Parser | PyMuPDF (fitz) |
- Math textbook support — detect equations and problem sets
- Scientific paper mode — Q&A over research papers (abstract, methodology, conclusions)
- Book type auto-detection — switch modes automatically based on content
- OCR fallback for scanned PDFs
- Answer verification — check student answers against book key
- Optimized for text-based PDFs. Scanned PDFs require OCR (not yet supported).
- Exercise extraction works best with grammar textbooks (Murphy-style format).
- Math and scientific PDF modes are in development.
MIT