synthetic-data-evaluation-framework

This repository evaluates synthetic tabular datasets against a real reference dataset using statistically grounded and model-based checks.

It is designed as an end-to-end workflow:

schema inference and alignment (strict or intersection mode)
per-dataset metric pipeline (similarity, detectability, privacy, optional utility)
normalization + weighted composite scoring
ranked reporting (metrics.json, scores.csv, plots/, logs.txt)

Architecture

End-to-End Runtime Flow

Schema and Alignment Decision Logic

Per-Synthetic Metric Pipeline

Normalization and Composite Scoring

Mermaid source: docs/architecture.md

Quickstart (UV)

uv sync

Run

Basic evaluation:

uv run sdeval --real raw\sample_dataset.csv --synthetic synthetic\ctgan_1x.csv synthetic\tvae_1x.csv --out reports\demo

With utility enabled via config:

uv run sdeval --real raw\sample_dataset.csv --synthetic synthetic\ctgan_1x.csv synthetic\tvae_1x.csv --config configs\examples\example_with_target.yaml --out reports\demo_with_target

Quality

uv run ruff format .
uv run ruff check .
uv run pytest -q

Core Logic (Short)

infer data types and exclude low-signal columns by default (constant / ID-like)
align real and synthetic columns with explicit conversion tracking
compute similarity (KS, Wasserstein, JSD, correlation drift)
run detectability model (real-vs-synthetic ROC-AUC)
compute privacy indicators (exact match, NN ratio, QID collisions)
optionally compute utility (TSTR vs baseline) when target is provided
normalize family scores to [0,1], apply weights, rank synthetic datasets

Key Outputs

Metrics and raw details: reports/<run>/metrics.json
Ranking table: reports/<run>/scores.csv
Plots (drift, radar, ranking): reports/<run>/plots/
Run log + warnings: reports/<run>/logs.txt

Optional Export

uv pip compile pyproject.toml -o requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
docs		docs
raw		raw
src/sdeval		src/sdeval
synthetic		synthetic
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

synthetic-data-evaluation-framework

Architecture

End-to-End Runtime Flow

Schema and Alignment Decision Logic

Per-Synthetic Metric Pipeline

Normalization and Composite Scoring

Quickstart (UV)

Run

Quality

Core Logic (Short)

Key Outputs

Optional Export

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

synthetic-data-evaluation-framework

Architecture

End-to-End Runtime Flow

Schema and Alignment Decision Logic

Per-Synthetic Metric Pipeline

Normalization and Composite Scoring

Quickstart (UV)

Run

Quality

Core Logic (Short)

Key Outputs

Optional Export

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages