The process command allows you to:
- Upload a DICOM file to the configured GCS bucket
- Automatically trigger processing via the deployed CloudRun service
- Poll BigQuery for the results
- Display an overview of the processed results
This is useful for testing the deployed infrastructure and retrieving results programmatically.
For archive file support (.zip, .tar.gz, .tgz), see ARCHIVE_SUPPORT.md.
- A deployed dcm2bq infrastructure (via Terraform)
- A deployment configuration file containing GCS and BigQuery information
- Google Cloud credentials configured (via
GOOGLE_APPLICATION_CREDENTIALSor ADC) - The
@google-cloud/storageand@google-cloud/bigquerynpm packages installed
Run the provided helper script to automatically extract Terraform outputs and create a config file:
node helpers/create-deployment-config.js --terraform-dir ./tf --output deployment-config.jsonThis creates a JSON file with the required configuration from your Terraform deployment.
Create a JSON file (e.g., deployment-config.json) with the following structure:
{
"gcpConfig": {
"projectId": "my-gcp-project",
"gcs_bucket_name": "dcm2bq-dicom-bucket-abc12345",
"bigQuery": {
"datasetId": "dicom",
"instancesTableId": "instances"
},
"embedding": {
"input": {
"gcsBucketPath": "gs://dcm2bq-processed-data-abc12345"
}
}
}
}Get these values from:
- Terraform:
terraform outputin the tf/ directory - GCP Console: Cloud Run, BigQuery, Cloud Storage
The command automatically uses test/testconfig.json if available, so the simplest usage is:
node src/index.js process <input-dicom-file>Or provide a custom config file:
node src/index.js process <input-dicom-file> --config <deployment-config-file>node src/index.js process test/files/dcm/ct.dcmThis uses the default test/testconfig.json if it exists (created by deployment).
node src/index.js process test/files/dcm/ct.dcm --config deployment-config.jsonnode src/index.js process test/files/dcm/ct.dcm \
--poll-interval 1000 \
--poll-timeout 30000 \
--poll-timeout-per-mb 5000-
--config <deploymentConfig>(optional)- Path to the deployment configuration file containing GCS bucket and BigQuery table information
- If not provided, defaults to
test/testconfig.jsonif it exists - Otherwise, use the helper script to generate one:
node helpers/create-deployment-config.js --terraform-dir ./tf
-
--poll-interval <ms>(default: 2000)- Interval between BigQuery polling attempts in milliseconds
- Shorter intervals = more frequent checks but higher API costs
- Longer intervals = less frequent checks but longer wait times
-
--poll-timeout <ms>(default: 60000)- Base maximum time to wait for results in milliseconds
- This is the timeout for small files
-
--poll-timeout-per-mb <ms>(default: 10000)- Additional timeout time per MB of input file size
- Total timeout =
--poll-timeout+ (file_size_MB ×--poll-timeout-per-mb) - Allows larger files more time to process
- Validation: Checks that the input file exists and the config file is valid
- Upload: Uploads the DICOM file to the configured GCS bucket with a timestamped, hashed filename
- Notification: GCS triggers a Pub/Sub event that notifies CloudRun
- Processing: The deployed CloudRun service processes the file and writes results to BigQuery
- Polling: Repeatedly queries BigQuery for the result row until found or timeout
- Results: Displays an overview of the processing results including metadata, patient info, and embedding details
Processing file: test/files/dcm/ct.dcm
File size: 1234567 bytes (1.18 MB)
✓ Loaded deployment config from: test/testconfig.json
Polling for results (interval: 2000ms, max time: 69000ms)...
✓ Found result in BigQuery after 5 polls (8234ms)
=== Processing Result Overview ===
Path: gs://dcm2bq-dicom-bucket-abc12345/uploads/1705089600000_a1b2c3d4_ct.dcm
Timestamp: 2024-01-12 15:40:00.000000 UTC
Version: 0
Event: OBJECT_FINALIZE
Input Size: 1234567 bytes
Input Type: GCS
Embedding Model: multimodalembedding@001
Embedding Input Path: gs://dcm2bq-processed-data-abc12345/embeddings/...
Size: 45678 bytes
MIME Type: image/jpeg
Patient Name: John^Doe
Patient ID: 12345678
Study Date: 20240112
Modality: CT
===================================
For single DICOM files, timeout is calculated as:
total_timeout = poll_timeout + (file_size_in_mb × poll_timeout_per_mb)
Examples:
- 1 MB file: 60000 + (1 × 10000) = 70000 ms (70 seconds)
- 10 MB file: 60000 + (10 × 10000) = 160000 ms (160 seconds)
- 50 MB file: 60000 + (50 × 10000) = 560000 ms (560 seconds)
Note: Archive files receive an additional +30 seconds for extraction. See ARCHIVE_SUPPORT.md for details.
The command will exit with an error message if:
- Input file doesn't exist
- Config file not found or invalid
- Config file missing required fields
- GCS upload fails
- BigQuery query fails (though it will retry)
- Check the path to your config file
- Ensure it exists and is readable
- Run the helper script to generate it from Terraform outputs
- The file may still be processing
- Increase
--poll-timeout-per-mbto wait longer - Check BigQuery manually to verify the result was created
- Check CloudRun logs for processing errors
- Check GCP credentials are configured
- Verify the GCS bucket name in the config file
- Ensure your service account has
storage.objectAdminrole
- Check GCP credentials and project ID
- Verify the dataset and table IDs in the config file
- Ensure the service account has
bigquery.dataEditorrole
If the polling times out, you can manually query BigQuery to check the result:
SELECT *
FROM `PROJECT_ID.DATASET_ID.TABLE_ID`
WHERE path = 'gs://BUCKET/uploads/...'
ORDER BY timestamp DESC
LIMIT 1Replace with actual values from your deployment config.
- Archive Support - Processing ZIP, TAR.GZ, and TGZ archives
- Implementation Details - Technical architecture
- Test Coverage - Test suite documentation