Quick Start

Quick Start Guide

This guide will help you run your first OCR task with Kiri OCR using both the Command Line Interface (CLI) and Python API.

1. Using the CLI

The CLI is the fastest way to test the model on an image without writing any code.

Basic Prediction

Run OCR on a single image file:

kiri-ocr predict path/to/document.jpg

What happens?

Auto-Download: The model is automatically downloaded from Hugging Face (first run only).
Detection: The text detector finds all text regions in the image.
Recognition: The OCR model reads the text in each region.
Output: The extracted text is printed to your terminal.

Saving Results & Visualization

To save the extracted text and visual reports, use the --output flag:

kiri-ocr predict document.jpg --output results/ --verbose

This creates a results/ directory containing:

File	Description
`extracted_text.txt`	The plain text content of the document.
`ocr_results.json`	Detailed structured data with bounding boxes and confidence scores.
`ocr_result.png`	The input image with recognized text overlaid on top.
`boxes.png`	The input image with detected bounding boxes drawn.
`report.html`	An interactive HTML report showing the image and results side-by-side.

Advanced CLI Options

Use GPU: Add --device cuda for faster processing.
Table Mode: Use --mode words to detect individual words (better for sparse text). Default is --mode lines.
JSON Only: Add --no-render to skip generating image/HTML reports (faster).

2. Using the Python API

For integration into your own Python applications, use the OCR class.

Minimal Example

from kiri_ocr import OCR

# Initialize (downloads model automatically)
ocr = OCR()

# Run inference
text, results = ocr.extract_text('document.jpg')

# Print extracted text
print(text)

Advanced Usage

You can access detailed information like bounding boxes, confidence scores, and line numbers.

from kiri_ocr import OCR

# Initialize with GPU support and verbose logging
ocr = OCR(device='cuda', verbose=True)

# Process document
text, results = ocr.extract_text('document.jpg')

# Iterate through detailed results
print(f"Found {len(results)} text regions:")

for line in results:
    box = line['box']        # [x, y, width, height]
    text = line['text']      # Recognized text string
    conf = line['confidence'] # Confidence score (0.0 - 1.0)
    
    print(f"[{conf:.1%}] {text} at {box}")

Working with Single Line Images

If you already have cropped images of text lines (e.g., from a separate detection process), you can skip the detection step:

# Recognize a single cropped line image
text, confidence = ocr.recognize_single_line_image('line_crop.png')
print(f"Recognized: '{text}' with confidence {confidence:.2f}")

Using Custom Models

Load a model you trained yourself or downloaded separately:

# Load local model file
ocr = OCR(model_path="path/to/my_model.safetensors")

# Load from a different Hugging Face repo
ocr = OCR(model_path="my-username/my-custom-kiri-model")

Kiri OCR Home | GitHub Repository | Report Issue

Home
Getting Started
- Installation
- Quick Start
Usage
Training & Data
About
- Architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start

Quick Start Guide

1. Using the CLI

Basic Prediction

Saving Results & Visualization

Advanced CLI Options

2. Using the Python API

Minimal Example

Advanced Usage

Working with Single Line Images

Using Custom Models

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally