If you have a table marked as ignore: true and vocabulary: true then any foreign keys constraints on foreign keys to this table are turned off during data generation.
For example, suppose you have a table of hospitals marked as vocabulary. You don't want to reproduce this table in the synthetic output, but you do want the idea of hospitals to be present. So perhaps a Visit table has a hospital_id column. You want this column to be populated with reasonable values, but the hospital table itself will not be present.
Datafaker permits this; you can choose a dist_gen.choice generator, for example, then Datafaker will query the database for all the possible values in the Visit.hospital_id column and the generator will then produce foreign keys from this set of valid hospital IDs. Even though this is a foreign key, the constraint is turned off and the generation will work.
However, if the user does not configure a generator for the Visit.hospital_id column themselves, Datafaker will be forced to produce its own default generator, and as there is no Hospital table for it to pull IDs from it will be forced to produce NULLs in this column.
The user is not warned during configure-generators that although Visit.hospital_id is a foreign key it is not safe to leave it as a default generator. The user is not warned during create-generators (or create-data post refactor) that this lack of explicit configuration is a problem.
The same is true if the table is marked as empty (num_rows_per_pass: 0), which gives the same (potentially worse?) issues.
So, please:
- in
GeneratorCmd.do_info warn the user if a foreign key is to an ignored or empty table that they will need to configure something (currently it says "You do not need a generator if you just want a uniform choice over the referenced table's rows" regardless of whether the foreign table is ignored or not)
- In
GeneratorCmd.set_prompt provide a nice indication that the foreign key needs to be configured
- In
create-data provide at least a warning if such a foreign key has not been configured.
- In
create-data and configure-generators at least warn if there are foreign keys in nonempty tables that point to empty tables unless they are explicity set to null generators.
If you have a table marked as
ignore: trueandvocabulary: truethen any foreign keys constraints on foreign keys to this table are turned off during data generation.For example, suppose you have a table of hospitals marked as vocabulary. You don't want to reproduce this table in the synthetic output, but you do want the idea of hospitals to be present. So perhaps a
Visittable has ahospital_idcolumn. You want this column to be populated with reasonable values, but the hospital table itself will not be present.Datafaker permits this; you can choose a
dist_gen.choicegenerator, for example, then Datafaker will query the database for all the possible values in theVisit.hospital_idcolumn and the generator will then produce foreign keys from this set of valid hospital IDs. Even though this is a foreign key, the constraint is turned off and the generation will work.However, if the user does not configure a generator for the
Visit.hospital_idcolumn themselves, Datafaker will be forced to produce its own default generator, and as there is noHospitaltable for it to pull IDs from it will be forced to produce NULLs in this column.The user is not warned during
configure-generatorsthat althoughVisit.hospital_idis a foreign key it is not safe to leave it as a default generator. The user is not warned duringcreate-generators(orcreate-datapost refactor) that this lack of explicit configuration is a problem.The same is true if the table is marked as empty (
num_rows_per_pass: 0), which gives the same (potentially worse?) issues.So, please:
GeneratorCmd.do_infowarn the user if a foreign key is to an ignored or empty table that they will need to configure something (currently it says "You do not need a generator if you just want a uniform choice over the referenced table's rows" regardless of whether the foreign table is ignored or not)GeneratorCmd.set_promptprovide a nice indication that the foreign key needs to be configuredcreate-dataprovide at least a warning if such a foreign key has not been configured.create-dataandconfigure-generatorsat least warn if there are foreign keys in nonempty tables that point to empty tables unless they are explicity set to null generators.