Requested feature
Currently, the opendataloader-pdf library requires a manual background process (opendataloader-pdf-hybrid --port 5002) to be running in a separate terminal to handle high-accuracy table extraction. This is inefficient for production scripts, CI/CD pipelines, and local development.
Goal: Allow the Python wrapper to automatically detect, start, and shut down the hybrid server process as part of the convert() execution.
The "Ideal" Future Code
opendataloader_pdf.convert(
input_path="my_doc.pdf",
output_dir="output",
hybrid="docling-fast",
auto_start_server=True # 👈 The new feature
)
Requested feature
Currently, the opendataloader-pdf library requires a manual background process (opendataloader-pdf-hybrid --port 5002) to be running in a separate terminal to handle high-accuracy table extraction. This is inefficient for production scripts, CI/CD pipelines, and local development.
Goal: Allow the Python wrapper to automatically detect, start, and shut down the hybrid server process as part of the convert() execution.
The "Ideal" Future Code
opendataloader_pdf.convert(
input_path="my_doc.pdf",
output_dir="output",
hybrid="docling-fast",
auto_start_server=True # 👈 The new feature
)