NEW Feature: Both Simple PDF Image Extractor and Enhanced PDF Extractor v09 now support automatic batch processing of entire folders!
Simply provide a folder path instead of a file path, and the node will automatically detect and process all PDFs in the folder.
The nodes automatically detect whether you've provided a file or folder path:
# Single PDF (original behavior)
pdf_path = "C:/Documents/portfolio.pdf"
→ Extracts images from one PDF
# Folder of PDFs (NEW - auto-detected)
pdf_path = "C:/Documents/portfolios/"
→ Processes all PDFs in folderNo configuration needed - just change the path!
my_portfolios/
├─ designer1.pdf
├─ designer2.pdf
├─ designer3.pdf
└─ artist_work.pdf
In the node's pdf_path parameter, enter the folder path:
C:/Documents/my_portfolios/
or
C:/Documents/my_portfolios
(Both work - trailing slash optional)
The node will:
- ✅ Detect it's a folder
- 🔍 Find all
.pdffiles - 📊 Report how many PDFs found
- 🔄 Process each PDF sequentially
- 💾 Save results in organized subfolders
- 📄 Create a batch summary JSON
output/simple_pdf_extraction/ (or output/pdf_extraction for v09)
├─ designer1_20251005_120000/
│ ├─ page_001_image_01.png
│ ├─ page_001_image_02.png
│ └─ designer1_all_text.txt
│
├─ designer2_20251005_120015/
│ ├─ page_001_image_01.png
│ └─ designer2_all_text.txt
│
├─ designer3_20251005_120030/
│ └─ page_001_image_01.png
│
└─ batch_summary_20251005_120000.json ← Batch statistics
Each PDF gets its own timestamped subfolder, plus a batch summary file.
Simple Extractor creates batch_summary_YYYYMMDD_HHMMSS.json:
{
"total_pdfs": 4,
"processed": 3,
"skipped": 1,
"total_images": 47,
"layered_pdfs": 2,
"total_time_seconds": 15.3,
"avg_time_per_pdf": 3.8,
"processing_times": [2.1, 3.4, 5.2, 4.6],
"results": [
{
"pdf": "designer1.pdf",
"images": 12,
"has_layers": true,
"status": "processed"
},
{
"pdf": "designer2.pdf",
"images": 0,
"has_layers": false,
"status": "skipped"
}
]
}v09 creates batch_summary_v09_YYYYMMDD_HHMMSS.json:
{
"total_pdfs": 4,
"processed": 3,
"skipped": 1,
"total_images": 47,
"total_enhanced": 45,
"layered_pdfs": 2,
"total_time_seconds": 480.5,
"avg_time_per_pdf": 120.1,
"processing_times": [95.2, 180.4, 120.3, 84.6],
"results": [
{
"pdf": "designer1.pdf",
"images": 12,
"enhanced": 12,
"has_layers": true,
"time": 95.2,
"status": "processed"
}
]
}🗂️ BATCH MODE: Processing folder
📁 Folder: C:/Documents/my_portfolios
🔍 Searching for PDF files...
📚 Found 4 PDF files
🏃 Mode: All PDFs
============================================================
[1/4] Processing: designer1.pdf
------------------------------------------------------------
🚀 Simple PDF Image Extractor (Layer-Aware)
📄 PDF: C:/Documents/my_portfolios/designer1.pdf
✨ PDF has 2 layers!
📋 Layer: 'Images' (ON)
📋 Layer: 'Text' (ON)
🚀 Using super-fast layer-based extraction
📖 Processing 8 pages...
📄 Page 1: 2 images
✅ Extracted: 2000×1500 → page_001_image_01.png
[... extraction continues ...]
[2/4] Processing: designer2.pdf
------------------------------------------------------------
[... continues for all PDFs ...]
============================================================
✅ Batch Extraction Complete
📁 Folder: my_portfolios
📚 Total PDFs: 4
✅ Processed: 3
⏭️ Skipped: 1
🖼️ Total images: 47
✨ Layered PDFs: 2
⏱️ Total time: 15.3s
📊 Avg per PDF: 3.8s
📄 Summary: output/simple_pdf_extraction/batch_summary_20251005_120000.json
============================================================
🗂️ BATCH MODE: Enhanced PDF Extractor v09
📁 Folder: C:/Documents/my_portfolios
🔍 Searching for PDF files...
📚 Found 4 PDF files
🏃 Mode: All PDFs
============================================================
============================================================
[1/4] Processing: designer1.pdf
============================================================
🚀 Enhanced PDF Extractor v09 - Layer Detection + Analysis Engine
🔍 Enhanced PDF Extraction Started:
📄 PDF: C:/Documents/my_portfolios/designer1.pdf
[... full v09 extraction process ...]
[2/4] Processing: designer2.pdf
[... continues ...]
============================================================
✅ Batch Extraction Complete (v09)
📁 Folder: my_portfolios
📚 Total PDFs: 4
✅ Processed: 3
⏭️ Skipped: 1
🖼️ Total images: 47
✨ Enhanced: 45
📋 Layered PDFs: 2
⏱️ Total time: 480.5s
📊 Avg per PDF: 120.1s
📄 Summary: output/pdf_extraction/batch_summary_v09_20251005_120000.json
============================================================
images, summary, image_count, has_layers = node.extract_images(folder_path, ...)
# images: Combined tensor of ALL images from ALL PDFs
# summary: Batch summary text (see example above)
# image_count: Total images from all PDFs
# has_layers: True if ANY PDF had layersextracted, enhanced, analysis, text, stats, output_path, has_layers = node.extract_enhanced(folder_path, ...)
# extracted: List of all extracted images from all PDFs
# enhanced: List of all enhanced images
# analysis: List of all page analyses
# text: Combined text from all PDFs
# stats: Batch statistics dict
# output_path: Base output directory
# has_layers: True if ANY PDF had layersAll node parameters work identically in batch mode:
Simple Extractor:
min_width,min_height- Applied to all PDFsextract_text- Text extracted from all PDFslayers_only_mode- Skips non-layered PDFs across entire batchdpi- Used for all PDFs
v09:
- All v09 parameters apply to entire batch
join_spreads,enable_image_enhancement, etc.- Each PDF gets full v09 treatment
Perfect for filtering layered PDFs only:
# Only process PDFs with layers, skip the rest
layers_only_mode = True
# Result: Non-layered PDFs show "skipped" status in batch summaryConsole output:
[1/4] Processing: layered_portfolio.pdf
✨ PDF has 2 layers!
🚀 Using super-fast layer-based extraction
[... extraction ...]
[2/4] Processing: scanned_doc.pdf
📄 No layers detected - using standard extraction
⚠️ Layers-only mode enabled, but PDF has no layers. Skipping extraction.
[... continues ...]
Processed: 2 (only layered PDFs)
Skipped: 2 (non-layered PDFs)
If a PDF fails to process:
- ❌ Error logged to console
- ⏭️ Batch continues with next PDF
- 📊 Error recorded in batch summary
- ✅ Other PDFs process normally
Batch summary for failed PDF:
{
"pdf": "corrupted.pdf",
"images": 0,
"enhanced": 0,
"has_layers": false,
"status": "error",
"error": "PDF file is corrupted or encrypted"
}Layered PDFs (best case):
- Per PDF: 2-5 seconds
- 10 PDFs: ~30 seconds
- 50 PDFs: ~150 seconds (2.5 minutes)
Non-layered PDFs:
- Per PDF: 5-10 seconds
- 10 PDFs: ~75 seconds
- 50 PDFs: ~500 seconds (8 minutes)
Layered PDFs (with layer detection):
- Per PDF: 2-5 seconds (layer extraction)
- Plus AI analysis if needed: +30-120 seconds
Non-layered PDFs (full AI analysis):
- Per PDF: 50-250 seconds depending on pages
- 10 PDFs: 500-2500 seconds (8-42 minutes)
- Consider using Simple Extractor for large batches
- Use Simple Extractor for large batches - Much faster if you don't need v09 features
- Enable layers_only_mode if you only care about layered PDFs
- Check batch summary to identify slow PDFs
- Process small batches first to estimate total time
Input: Folder with 20 portfolio PDFs
Mode: Simple Extractor with layers_only_mode=True
Result: Only extracts from professional layered portfolios
Time: 1-2 minutes
Input: Folder with 100 magazine PDFs
Mode: Simple Extractor (standard)
Result: All images extracted from all magazines
Time: 10-15 minutes for layered, 1-2 hours for non-layered
Input: Folder with mixed project PDFs
Mode: v09 with full analysis
Result: Enhanced images with layout analysis
Time: Varies by PDF complexity
Input: Folder with new submissions
Mode: Simple Extractor
Result: Quick preview of all images
Review: Check batch_summary.json for statistics
| Feature | Single PDF Mode | Batch/Folder Mode |
|---|---|---|
| Input | File path | Folder path |
| Detection | N/A | Automatic |
| Output | One subfolder | Multiple subfolders |
| Summary | Text summary | JSON + Text summary |
| Statistics | Per-PDF stats | Aggregate + per-PDF stats |
| Return Values | Single PDF results | Combined results |
| Progress | Single progress | Per-PDF progress |
| Error Handling | Fails immediately | Continues on error |
Put all PDFs you want to process in one folder. Node only processes PDFs in the root folder (doesn't search subfolders).
The JSON file contains detailed statistics:
- Which PDFs were processed successfully
- Which were skipped (and why)
- Processing time per PDF
- Total images found
- Layer detection results
- Simple Extractor: Fast, simple extraction, great for batches
- v09: Slow but thorough, use for smaller batches or when you need advanced features
Watch for:
- PDFs being skipped (layers_only_mode)
- Errors processing specific files
- Layer detection results
- Processing time per PDF
Before processing 100 PDFs, test with 2-3 PDFs first to:
- Verify settings
- Check output quality
- Estimate total processing time
Q: "No images extracted from batch"
- Check if PDFs actually have images
- Look at batch summary JSON to see per-PDF results
- Try single PDF mode on one file to debug
Q: "Batch taking forever"
- v09 is slow for large batches - use Simple Extractor
- Check if PDFs have layers (faster processing)
- Consider enabling layers_only_mode to skip slow PDFs
Q: "Some PDFs skipped"
- Check batch summary JSON for skip reasons
- If layers_only_mode=True, non-layered PDFs are skipped
- Errors logged in console and summary
Q: "Can I process subfolders?"
- Not currently - only PDFs in the root folder are processed
- Move all PDFs to a single folder first
Q: "How to cancel batch?"
- Stop ComfyUI execution (standard cancel)
- Already processed PDFs will remain in output folder
- Partial results are saved
# In ComfyUI node:
pdf_path = "C:/Submissions/portfolios/"
output_directory = "output/portfolio_review"
min_width = 200
min_height = 200
extract_text = False
layers_only_mode = True # Only layered portfolios
# Result: Fast extraction of professional work only# In ComfyUI node:
pdf_path = "C:/Archives/magazines_2024/"
output_directory = "output/magazine_archive"
min_width = 100
min_height = 100
extract_text = True
layers_only_mode = False # Get everything
# Result: All images + text from all magazines# In ComfyUI node (v09):
pdf_path = "C:/Client/deliverables/"
output_directory = "output/client_analysis"
enable_image_enhancement = True
join_spreads = True
debug_mode = True
# Result: Full analysis with enhanced images
# Warning: Will be slow for many PDFs✅ Auto-detection: Just provide folder path, node handles the rest ✅ Organized output: Each PDF gets its own subfolder + batch summary ✅ Statistics: Detailed JSON with per-PDF and aggregate stats ✅ Error resilient: Continues processing even if one PDF fails ✅ All features work: Every parameter works in batch mode ✅ Performance aware: Simple Extractor recommended for large batches
Getting Started:
- Put PDFs in a folder
- Enter folder path in node
- Run - node auto-detects and processes all PDFs
- Check batch_summary.json for results