Skip to content

Quick Start

Tmob edited this page Jan 28, 2026 · 2 revisions

Quick Start Guide

This guide will help you run your first OCR task with Kiri OCR using both the Command Line Interface (CLI) and Python API.

1. Using the CLI

The CLI is the fastest way to test the model on an image without writing any code.

Basic Prediction

Run OCR on a single image file:

kiri-ocr predict path/to/document.jpg

What happens?

  1. Auto-Download: The model is automatically downloaded from Hugging Face (first run only).
  2. Detection: The text detector finds all text regions in the image.
  3. Recognition: The OCR model reads the text in each region.
  4. Output: The extracted text is printed to your terminal.

Saving Results & Visualization

To save the extracted text and visual reports, use the --output flag:

kiri-ocr predict document.jpg --output results/ --verbose

This creates a results/ directory containing:

File Description
extracted_text.txt The plain text content of the document.
ocr_results.json Detailed structured data with bounding boxes and confidence scores.
ocr_result.png The input image with recognized text overlaid on top.
boxes.png The input image with detected bounding boxes drawn.
report.html An interactive HTML report showing the image and results side-by-side.

Advanced CLI Options

  • Use GPU: Add --device cuda for faster processing.
  • Table Mode: Use --mode words to detect individual words (better for sparse text). Default is --mode lines.
  • JSON Only: Add --no-render to skip generating image/HTML reports (faster).

2. Using the Python API

For integration into your own Python applications, use the OCR class.

Minimal Example

from kiri_ocr import OCR

# Initialize (downloads model automatically)
ocr = OCR()

# Run inference
text, results = ocr.extract_text('document.jpg')

# Print extracted text
print(text)

Advanced Usage

You can access detailed information like bounding boxes, confidence scores, and line numbers.

from kiri_ocr import OCR

# Initialize with GPU support and verbose logging
ocr = OCR(device='cuda', verbose=True)

# Process document
text, results = ocr.extract_text('document.jpg')

# Iterate through detailed results
print(f"Found {len(results)} text regions:")

for line in results:
    box = line['box']        # [x, y, width, height]
    text = line['text']      # Recognized text string
    conf = line['confidence'] # Confidence score (0.0 - 1.0)
    
    print(f"[{conf:.1%}] {text} at {box}")

Working with Single Line Images

If you already have cropped images of text lines (e.g., from a separate detection process), you can skip the detection step:

# Recognize a single cropped line image
text, confidence = ocr.recognize_single_line_image('line_crop.png')
print(f"Recognized: '{text}' with confidence {confidence:.2f}")

Using Custom Models

Load a model you trained yourself or downloaded separately:

# Load local model file
ocr = OCR(model_path="path/to/my_model.safetensors")

# Load from a different Hugging Face repo
ocr = OCR(model_path="my-username/my-custom-kiri-model")

Clone this wiki locally