-
Notifications
You must be signed in to change notification settings - Fork 3
Detector Training
Kiri OCR supports training custom text detectors using either CRAFT (Character Region Awareness for Text Detection) or DB (Differentiable Binarization). While the pre-trained detector works well for general documents, training a custom detector is recommended for specific layouts, novel fonts, or challenging backgrounds.
Training a detector requires a dataset of images with ground truth bounding boxes. Kiri OCR provides a synthetic data generator to create this data from text files.
-
Text Corpus: A
data.txtfile containing sample text lines (one per line). -
Fonts: A directory of
.ttfor.otffont files. - Backgrounds (Optional): A directory of background images to paste text onto.
kiri-ocr generate-detector \
--text-file data.txt \
--fonts-dir fonts/ \
--output my_detector_data \
--num-train 1000 \
--num-val 200 \
--min-lines 5 \
--max-lines 15Parameters:
-
--text-file: Source text. -
--output: Output directory. -
--num-train: Number of training images to generate. -
--num-val: Number of validation images. -
--min-lines/--max-lines: Number of text lines per image. Randomly chosen between min and max. -
--image-height: Height of the generated images (default 512). Width is calculated to maintain aspect ratio.
The command creates a directory my_detector_data with the following structure compatible with YOLO-style formatting (used internally by the DB trainer):
my_detector_data/
├── images/
│ ├── train/
│ │ ├── img_00001.jpg
│ │ └── ...
│ └── val/
│ ├── img_00001.jpg
│ └── ...
├── labels/
│ ├── train/
│ │ ├── img_00001.txt
│ │ └── ...
│ └── val/
│ ├── img_00001.txt
│ └── ...
└── data.yaml # Configuration file pointing to paths
Once the data is generated, use the train-detector command. Currently, this trains a DBNet model.
kiri-ocr train-detector \
--data-yaml my_detector_data/data.yaml \
--epochs 100 \
--batch-size 8 \
--image-size 640 \
--name my_custom_detectorParameters:
-
--data-yaml: Path to thedata.yamlgenerated in step 1. -
--epochs: Number of training epochs. -
--batch-size: Batch size (reduce if you run out of GPU memory). -
--image-size: Input image size for training. Must be a multiple of 32. -
--model-size: Size of the backbone. Options:n(nano),s(small),m(medium),l(large). Default isn. -
--name: Name of the training run. Checkpoints will be saved inruns/detect/{name}/.
Training progress is printed to the console. You will see metrics for:
- Box Loss: How accurately the bounding boxes are predicted.
- Class Loss: (Usually 0 for text detection as there is only one class).
- DFL Loss: Distribution Focal Loss.
After training, the best model weights are saved to runs/detect/my_custom_detector/weights/best.pt.
To use this model in your application:
from kiri_ocr import OCR
# Initialize OCR with your custom detector
ocr = OCR(
det_model_path="runs/detect/my_custom_detector/weights/best.pt",
det_method="db"
)
# Run prediction
results = ocr.process_document("test_image.jpg")- Diverse Backgrounds: If your real-world data has complex backgrounds (receipts, street scenes), ensure your training data reflects this. You can modify the generator to use background images.
-
Image Size: Use a larger
--image-size(e.g., 1024 or 1280) if you need to detect very small text. - Augmentation: The trainer applies standard augmentations (flip, scale, color jitter). Ensure these are appropriate for your text (e.g., vertical flip might not be good for text).
Kiri OCR Home | GitHub Repository | Report Issue
© 2026 Kiri OCR. Released under the Apache 2.0 License.