PatientGenerator facilitates the creation of synthetic test datasets
for the OMOP Common Data Model (CDM) using two complementary approaches:
patientChat: Generates structured patient JSON files using Large Language Models (LLMs).patientDesigner: Provides a D3-based Shiny interface for reviewing and editing CDM test sets.
The package also includes support for Hecate-powered concept lookups to ensure valid OMOP concept codes.
# install.packages("remotes")
remotes::install_github("mi-erasmusmc/PatientGenerator")- Generate an initial synthetic cohort using
patientChat. - Save JSON test sets to the local filesystem.
- Refine patients using
patientDesigner().- Utilize built-in concept search (powered by
hecateSearch) during table editing.
- Utilize built-in concept search (powered by
Set an OPENAI_API_KEY environment variable (e.g., via
usethis::edit_r_environ()) to enable LLM access.
Available models can be listed using
PatientGenerator::availableModels().
library(PatientGenerator)
patientGenerator <- patientChat$new(
model = "gpt-5.4",
echo = "none"
)Provide detailed prompts, including specific concept sets, for optimal results.
patientGenerator$prompt(
"Population (person table):
- 10 adult patients
- 5 female
- 5 male
Observation Period:
- Start date between date of birth and 2025-12-31
Condition Occurrence:
- All patients must have Diabetes (condition_concept_id: 201826)
- Start date between 2015-01-01 and 2020-12-31
Drug Exposure:
- All patients must have Semaglutide (drug_concept_id: 19079450)
- Exposure within 30 days post-index date
Measurement:
- All patients must have Fasting glucose (measurement_concept_id: 3018251)
Procedure Occurrence:
- 50% of patients must have Amputation of toe (procedure_concept_id: 4159766)
Output Requirements:
- Populate only the tables specified in this prompt"
)Save the generated dataset as a JSON file and utilize
TestGenerator::patientsCDM to instantiate a CDM reference.
patientGenerator$save(name = "diabetes-patients")
cdm <- TestGenerator::patientsCDM(
testName = "diabetes-patients",
cdmVersion = "5.4"
)
cdm$person |>
collect() |>
print()#> cdm$person |> collect() |> head(5)
#> person_id gender_concept_id year_of_birth person_source_value
#> <int> <int> <int> <char>
#> 1: 1 8532 1965 SYN001
#> 2: 2 8532 1972 SYN002
#> 3: 3 8532 1958 SYN003
#> 4: 4 8532 1981 SYN004
#> 5: 5 8532 1949 SYN005
The LLM can be instructed to modify the current test set within the same
patientChat instance.
patientGenerator$prompt("Remove all male patients")#> cdm$person |> collect() |> head(5)
#> person_id gender_concept_id year_of_birth person_source_value
#> <int> <int> <int> <char>
#> 1: 1 8532 1965 SYN001
#> 2: 2 8532 1972 SYN002
#> 3: 3 8532 1958 SYN003
#> 4: 4 8532 1981 SYN004
#> 5: 5 8532 1949 SYN005
Launch the interactive editor to review and refine datasets:
PatientGenerator::patientDesigner()The interface supports:
- Loading existing JSON test sets.
- Interactive CRUD operations (Create, Read, Update, Delete) on CDM tables.
- Visual timeline inspection and table previews.
- Exporting updated test sets to JSON.
patientDesigner integrates a concept search module powered by
hecateSearch(). This allows users to search for and insert valid OMOP
concept IDs directly into the CDM tables.
Configure Hecate globally via environment variables:
Sys.setenv(
HECATE_BASE_URL = "https://your-hecate-server/api",
HECATE_API_KEY = "your-api-key"
)Or via package options:
options(PatientGenerator.hecate = list(
base_url = "https://your-hecate-server/api",
timeout_ms = 15000,
api_key = "your-api-key"
))- Vignette:
vignette("shiny-integration", package = "PatientGenerator") - Reference: Detailed API documentation and benchmarks are available on the GitHub Pages site.