Document Parser

A modern web application for extracting text and data from PDF, DOCX, and image files with advanced analytics and export capabilities. Built with Next.js, React, and Supabase.

⚠️ Note: Image OCR is powered by Supabase Edge Functions and requires additional backend setup (see below).

Demo

Nextjs.PDF.Parser.-.Google.Chrome.2026-01-21.17-16-28.1.mp4

Features

Core Functionality

Multi-format Support: Upload and parse PDF, DOCX, and image files
Text Extraction: Automatically extract text content from documents
Image Data Extraction: Extract structured data from images using AI-powered OCR
User Authentication: Secure login and signup with Supabase Auth
Real-time Processing: Live progress tracking for document uploads

Advanced Features

Analytics Dashboard:
- View total documents processed
- Track document types breakdown (PDF, DOCX, Images)
- Monitor processing status (Completed, Processing, Failed)
- Visual progress bars and statistics
Export Options:
- Export extracted data as JSON, CSV, or TXT
- Download with proper formatting
- Preserve data structure for images
Document Search:
- Search within extracted content
- Real-time text highlighting
- Clear and intuitive interface
Document Management:
- Browse all uploaded documents via sidebar
- Filter by document type
- View document history
- Quick access to previously processed files

UI/UX

Modern Interface: Clean, responsive design with gradient backgrounds
Dark Theme: Toggle between light and dark modes with persistent preference
Drag-and-Drop Upload: Easy file upload with visual feedback
File Size Limit: Supports files up to 8MB
Loading States: Beautiful animated loaders during processing
Toast Notifications: Real-time feedback for all operations

Tech Stack

Framework: Next.js 16
Frontend: React 19, TypeScript
Styling: Tailwind CSS
UI Components: Radix UI, Lucide Icons
Authentication: Supabase Auth
Database: Supabase
Document Parsing:
- PDF: pdf2json
- DOCX: mammoth
- Images: AI-powered extraction

Prerequisites

Node.js >= 18.17.0
npm or yarn
Supabase account

Installation

Clone the repository:

git clone https://github.com/faheemjabbar/ocr-nextjs
cd ocr-nextjs

Install dependencies:

npm install

Set up environment variables:

Create a .env.local file in the root directory:

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_key

Run the development server:

npm run dev

Open http://localhost:3000 in your browser.

Usage

Sign Up/Login: Create an account or log in with your credentials
Upload Document: Click "Upload Document" and select a file (PDF, DOCX, or image)
View Results: The extracted text or data will be displayed automatically
Supported Formats:
- PDF files (.pdf)
- Word documents (.docx)
- Images (.png, .jpg, .jpeg, .gif, .webp)

Image OCR

Image processing is not handled directly in the Next.js API route.

How Image OCR Works

Images are uploaded to Supabase Storage
A document record is created in the database
The frontend explicitly calls a Supabase Edge Function

The Edge Function:

Downloads the image from storage
Sends it to an OCR provider
Stores structured results in the database

Required Setup

To enable image OCR:

You must create your own Supabase Edge Function
Add authentication verification inside the function
Configure environment variables in Supabase

Project Structure

├── app/
│   ├── api/
│   │   └── parse-data/      # API route for document parsing
│   ├── layout.tsx            # Root layout
│   └── page.tsx              # Main page with auth logic
├── components/
│   ├── Auth.tsx              # Authentication component
│   ├── FileUploader.tsx      # File upload component
│   ├── HomePage.tsx          # Main application interface
│   └── ui/                   # Reusable UI components
├── lib/
│   ├── supabase.ts           # Supabase client configuration
│   └── utils.ts              # Utility functions
└── public/                   # Static assets

Scripts

npm run dev - Start development server
npm run build - Build for production
npm start - Start production server
npm run lint - Run ESLint

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
app		app
components		components
lib		lib
public		public
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Parser

Demo

Features

Core Functionality

Advanced Features

UI/UX

Tech Stack

Prerequisites

Installation

Usage

Image OCR

Required Setup

Project Structure

Scripts

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Parser

Demo

Features

Core Functionality

Advanced Features

UI/UX

Tech Stack

Prerequisites

Installation

Usage

Image OCR

Required Setup

Project Structure

Scripts

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages