Detector Training

Detector Training Guide

Kiri OCR supports training custom text detectors using either CRAFT (Character Region Awareness for Text Detection) or DB (Differentiable Binarization). While the pre-trained detector works well for general documents, training a custom detector is recommended for specific layouts, novel fonts, or challenging backgrounds.

1. Generate Training Data

Training a detector requires a dataset of images with ground truth bounding boxes. Kiri OCR provides a synthetic data generator to create this data from text files.

Prerequisites

Text Corpus: A data.txt file containing sample text lines (one per line).
Fonts: A directory of .ttf or .otf font files.
Backgrounds (Optional): A directory of background images to paste text onto.

Command

kiri-ocr generate-detector \
    --text-file data.txt \
    --fonts-dir fonts/ \
    --output my_detector_data \
    --num-train 1000 \
    --num-val 200 \
    --min-lines 5 \
    --max-lines 15

Parameters:

--text-file: Source text.
--output: Output directory.
--num-train: Number of training images to generate.
--num-val: Number of validation images.
--min-lines / --max-lines: Number of text lines per image. Randomly chosen between min and max.
--image-height: Height of the generated images (default 512). Width is calculated to maintain aspect ratio.

Output Structure

The command creates a directory my_detector_data with the following structure compatible with YOLO-style formatting (used internally by the DB trainer):

my_detector_data/
├── images/
│   ├── train/
│   │   ├── img_00001.jpg
│   │   └── ...
│   └── val/
│       ├── img_00001.jpg
│       └── ...
├── labels/
│   ├── train/
│   │   ├── img_00001.txt
│   │   └── ...
│   └── val/
│       ├── img_00001.txt
│       └── ...
└── data.yaml  # Configuration file pointing to paths

2. Train the Detector

Once the data is generated, use the train-detector command. Currently, this trains a DBNet model.

kiri-ocr train-detector \
    --data-yaml my_detector_data/data.yaml \
    --epochs 100 \
    --batch-size 8 \
    --image-size 640 \
    --name my_custom_detector

Parameters:

--data-yaml: Path to the data.yaml generated in step 1.
--epochs: Number of training epochs.
--batch-size: Batch size (reduce if you run out of GPU memory).
--image-size: Input image size for training. Must be a multiple of 32.
--model-size: Size of the backbone. Options: n (nano), s (small), m (medium), l (large). Default is n.
--name: Name of the training run. Checkpoints will be saved in runs/detect/{name}/.

3. Monitor Training

Training progress is printed to the console. You will see metrics for:

Box Loss: How accurately the bounding boxes are predicted.
Class Loss: (Usually 0 for text detection as there is only one class).
DFL Loss: Distribution Focal Loss.

4. Use Your Custom Detector

After training, the best model weights are saved to runs/detect/my_custom_detector/weights/best.pt.

To use this model in your application:

from kiri_ocr import OCR

# Initialize OCR with your custom detector
ocr = OCR(
    det_model_path="runs/detect/my_custom_detector/weights/best.pt",
    det_method="db"
)

# Run prediction
results = ocr.process_document("test_image.jpg")

Tips for Better Detection

Diverse Backgrounds: If your real-world data has complex backgrounds (receipts, street scenes), ensure your training data reflects this. You can modify the generator to use background images.
Image Size: Use a larger --image-size (e.g., 1024 or 1280) if you need to detect very small text.
Augmentation: The trainer applies standard augmentations (flip, scale, color jitter). Ensure these are appropriate for your text (e.g., vertical flip might not be good for text).

Kiri OCR Home | GitHub Repository | Report Issue

Home
Getting Started
- Installation
- Quick Start
Usage
Training & Data
About
- Architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detector Training

Detector Training Guide

1. Generate Training Data

Prerequisites

Command

Output Structure

2. Train the Detector

3. Monitor Training

4. Use Your Custom Detector

Tips for Better Detection

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally