Skip to content

Foreign keys to ignored vocabulary tables are hard to configure correctly #92

@tim-band

Description

@tim-band

If you have a table marked as ignore: true and vocabulary: true then any foreign keys constraints on foreign keys to this table are turned off during data generation.

For example, suppose you have a table of hospitals marked as vocabulary. You don't want to reproduce this table in the synthetic output, but you do want the idea of hospitals to be present. So perhaps a Visit table has a hospital_id column. You want this column to be populated with reasonable values, but the hospital table itself will not be present.

Datafaker permits this; you can choose a dist_gen.choice generator, for example, then Datafaker will query the database for all the possible values in the Visit.hospital_id column and the generator will then produce foreign keys from this set of valid hospital IDs. Even though this is a foreign key, the constraint is turned off and the generation will work.

However, if the user does not configure a generator for the Visit.hospital_id column themselves, Datafaker will be forced to produce its own default generator, and as there is no Hospital table for it to pull IDs from it will be forced to produce NULLs in this column.

The user is not warned during configure-generators that although Visit.hospital_id is a foreign key it is not safe to leave it as a default generator. The user is not warned during create-generators (or create-data post refactor) that this lack of explicit configuration is a problem.

The same is true if the table is marked as empty (num_rows_per_pass: 0), which gives the same (potentially worse?) issues.

So, please:

  1. in GeneratorCmd.do_info warn the user if a foreign key is to an ignored or empty table that they will need to configure something (currently it says "You do not need a generator if you just want a uniform choice over the referenced table's rows" regardless of whether the foreign table is ignored or not)
  2. In GeneratorCmd.set_prompt provide a nice indication that the foreign key needs to be configured
  3. In create-data provide at least a warning if such a foreign key has not been configured.
  4. In create-data and configure-generators at least warn if there are foreign keys in nonempty tables that point to empty tables unless they are explicity set to null generators.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions