An Autonomous Multimodal UI Navigator for Legal Research
Citadelle Vision is a next-generation AI agent built for the Gemini Live Agent Challenge. It uses Google Gemini Multimodal to "see" and navigate complex web interfaces (like CourtListener, Oyez, and YouTube), turning 5 hours of manual legal research into a 30-second exportable executive brief.
- Visual DOM Interaction: Uses Playwright to inject bounding boxes into the UI, allowing Gemini to seamlessly click, type, and extract data using computer vision.
- Autonomous Navigation: Solves complex multi-step tasks (e.g., finding a specific precedent case) without relying on rigid API endpoints.
- Multimodal Extraction: Capable of reading embedded PDFs, transcribing YouTube videos, and scraping complex HTML dynamically.
- Cloud Native: Fully containerized and deployed on Google Cloud Run.
- AI Model: Google Gemini Multimodal (via
@google/genaiSDK) - Agent Framework: Playwright (Headless Chromium)
- Backend: Node.js, Express, WebSockets
- Frontend: React, Vite, Tailwind CSS
- Infrastructure: Docker, Google Cloud Run
To run Citadelle Vision locally, follow these steps:
- Node.js (v18 or higher)
- npm or yarn
- A valid Google Gemini API Key
git clone https://github.com/SifraDev/citadelle-vision-agent.git
cd citadelle-vision-agentnpm install
# Ensure Playwright browser binaries are installed
npx playwright install chromiumCreate a .env file in the root directory and add your Gemini API Key:
GEMINI_API_KEY=your_gemini_api_key_herenpm run devThis project is fully containerized and designed to run on Google Cloud Run. The included Dockerfile utilizes the official mcr.microsoft.com/playwright:jammy base image to ensure the headless browser has all necessary system dependencies to execute the agentic loop in a serverless environment.
Deployment Steps:
- Connect this GitHub repository to Google Cloud Build.
- Select Dockerfile as the build type.
- Set the service to use 2 to 4 GiB of Memory and 2 CPUs (required for Playwright).
- Add the
GEMINI_API_KEYto the Cloud Run Environment Variables. - Deploy.