[OPIK-6246] [SDK] feat: add 'opik copy dataset' CLI command#6555
[OPIK-6246] [SDK] feat: add 'opik copy dataset' CLI command#6555JetoPistola wants to merge 4 commits intomainfrom
Conversation
Adds `opik copy WORKSPACE dataset NAME --destination-project NAME [...]`, a same-instance copy command that lifts a dataset (and by default its experiments + traces + spans) into a destination project on the same Opik instance. Built as a thin orchestration layer over the existing `opik export` and `opik import` flows, with `MigrationManifest`-backed resumability and a persistent run dir at `~/.opik/copy-runs/`. The only modification to existing import code is a new optional `destination_project` kwarg threaded through `import_datasets_from_directory`, `_build_dataset_item_id_map`, `recreate_experiments`, `_import_traces_from_projects_directory`, and `import_experiments_from_directory`. Default `None` preserves today's `opik import` behaviour exactly — the existing 258-test suite passes unchanged. Implements OPIK-6246: Self-serve v1 → v2 workspace migration via SDK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six fixes from PR review: - Stable resumable run dir: replace datetime-based path with a fingerprint hash of (workspace, dataset, destination, source, exclude_experiments) so re-running the same command reuses the prior MigrationManifest. - _filter_experiments_by_source_project: build a trace→project index from disk and keep experiments whose project can't be resolved (defensive default) instead of silently unlinking them. - _verify_destination_counts: stop swallowing SDK errors into a generic "counts didn't match" exit; let auth/network/permission failures propagate so the real cause is visible. - copy_dataset_command pre-flight: drop the broad except Exception catch on get_dataset; only DatasetNotFound is handled with a friendly message, everything else surfaces as itself. - import_datasets_from_directory get-or-create: narrow the broad except to exceptions.DatasetNotFound so non-NotFound errors don't silently trigger a create. - Extract the duplicated Click format_commands override into cli/_group_helpers.bind_items_format_commands and reuse it across the export/import/copy groups. 4 new unit tests guard the new behaviours; full SDK CLI suite (279 tests) passes. Pre-commit (ruff, ruff-format, mypy) clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Hoist the get_dataset_experiments cap to a named EXPERIMENT_LIST_CAP constant (10,000) used both during the source-side enumeration and in _verify_destination_counts. When either side hits the cap, the verifier now warns explicitly that the result is best-effort — prior behaviour silently passed when source and destination both saturated at the cap with potentially different real counts. - Swap the manual get_dataset/DatasetNotFound/create_dataset dance in import_datasets_from_directory for client.get_or_create_dataset, the stable SDK API that already encapsulates this. Drops the local exceptions import; SDK errors (auth/network/permissions) propagate to the per-file error handler as themselves. Tests updated to assert on the new SDK call shape; full suite (279 tests) passes. Pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
4 similar comments
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
Details
Adds
opik copy WORKSPACE dataset NAME --destination-project NAME [...]— a same-instance copy command that lifts a dataset (and by default its experiments + traces + spans) into a destination project on the same Opik instance. Implemented as a thin orchestration layer over the existingopik export/opik importflows per Andrés's design comment.copyClick group atsdks/python/src/opik/cli/copy/mirroringexports/andimports/. Structure leaves room forcopy prompt/copy experimentlater.~/.opik/copy-runs/<workspace>-<dataset>-<timestamp>/, prints a Rich pre-flight summary + confirmation, then runs the existing importers with a newdestination_projectoverride threaded through. Post-copy count diff verifies source vs destination.MigrationManifestmakes the run resumable; cleanup happens on success unless--debug.--destination-project(required),--source-project,--exclude-experiments,--dry-run,--debug,--force,--yesfor scripted use.destination_project=Nonekwarg inimport_datasets_from_directory,_build_dataset_item_id_map,recreate_experiments,_import_traces_from_projects_directory, andimport_experiments_from_directory. Defaults preserve today'sopik importbehaviour exactly — the existing 258-test suite passes unchanged.Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
localhost:5174Testing
Unit tests (run from
sdks/python):```
python -m pytest tests/unit/cli/test_copy_dataset.py tests/unit/cli/test_import_experiment.py tests/unit/test_export_import_all.py
```
tests/unit/cli/test_copy_dataset.pycovering: Click surface (required flags, help, missing ITEM),destination_projectplumbing intoimport_datasets_from_directoryand_import_traces_from_projects_directory, regression guards provingdestination_project=Nonepreserves today'sopik importbehaviour, orchestrator helpers (_make_run_dir,_scan_run_dir,_filter_experiments_by_source_project), and end-to-end orchestration (missing source dataset → exit 1,--dry-runskips import,--exclude-experimentsskips experiment import, full copy threadsdestination_projectinto both imports, count-diff mismatch → exit 1, user declines confirmation → no import).Manual smoke test against
localhost:5174is pending — flagged as the last todo on the ticket and intended for follow-up before un-drafting.Documentation
N/A for this PR. CLI help text (
opik copy --help,opik copy WS dataset --help) is the user-facing surface for v1; broader docs/migration-guide updates will land alongside the wider OPIK-5859 epic.