Skip to content

[OPIK-6246] [SDK] feat: add 'opik copy dataset' CLI command#6555

Draft
JetoPistola wants to merge 4 commits intomainfrom
danield/OPIK-6246-add-opik-copy-dataset-cli
Draft

[OPIK-6246] [SDK] feat: add 'opik copy dataset' CLI command#6555
JetoPistola wants to merge 4 commits intomainfrom
danield/OPIK-6246-add-opik-copy-dataset-cli

Conversation

@JetoPistola
Copy link
Copy Markdown
Contributor

@JetoPistola JetoPistola commented Apr 29, 2026

Details

image

Adds opik copy WORKSPACE dataset NAME --destination-project NAME [...] — a same-instance copy command that lifts a dataset (and by default its experiments + traces + spans) into a destination project on the same Opik instance. Implemented as a thin orchestration layer over the existing opik export / opik import flows per Andrés's design comment.

  • New copy Click group at sdks/python/src/opik/cli/copy/ mirroring exports/ and imports/. Structure leaves room for copy prompt / copy experiment later.
  • The orchestrator runs the existing exporters into a persistent run dir at ~/.opik/copy-runs/<workspace>-<dataset>-<timestamp>/, prints a Rich pre-flight summary + confirmation, then runs the existing importers with a new destination_project override threaded through. Post-copy count diff verifies source vs destination. MigrationManifest makes the run resumable; cleanup happens on success unless --debug.
  • Flags: --destination-project (required), --source-project, --exclude-experiments, --dry-run, --debug, --force, --yes for scripted use.
  • The only change to existing import code is a new optional destination_project=None kwarg in import_datasets_from_directory, _build_dataset_item_id_map, recreate_experiments, _import_traces_from_projects_directory, and import_experiments_from_directory. Defaults preserve today's opik import behaviour exactly — the existing 258-test suite passes unchanged.

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-6246
  • Parent epic: OPIK-5859 (Self-serve v1 → v2 workspace migration via SDK)

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.7 (1M context)
  • Scope: full implementation (orchestrator, plumbing, tests)
  • Human verification: code review pending; manual end-to-end smoke test pending against localhost:5174

Testing

Unit tests (run from sdks/python):

```
python -m pytest tests/unit/cli/test_copy_dataset.py tests/unit/cli/test_import_experiment.py tests/unit/test_export_import_all.py
```

  • 17 new tests in tests/unit/cli/test_copy_dataset.py covering: Click surface (required flags, help, missing ITEM), destination_project plumbing into import_datasets_from_directory and _import_traces_from_projects_directory, regression guards proving destination_project=None preserves today's opik import behaviour, orchestrator helpers (_make_run_dir, _scan_run_dir, _filter_experiments_by_source_project), and end-to-end orchestration (missing source dataset → exit 1, --dry-run skips import, --exclude-experiments skips experiment import, full copy threads destination_project into both imports, count-diff mismatch → exit 1, user declines confirmation → no import).
  • All 258 pre-existing CLI/import/export unit tests still pass.
  • Pre-commit (ruff, ruff-format, mypy) clean.

Manual smoke test against localhost:5174 is pending — flagged as the last todo on the ticket and intended for follow-up before un-drafting.

Documentation

N/A for this PR. CLI help text (opik copy --help, opik copy WS dataset --help) is the user-facing surface for v1; broader docs/migration-guide updates will land alongside the wider OPIK-5859 epic.

Adds `opik copy WORKSPACE dataset NAME --destination-project NAME [...]`,
a same-instance copy command that lifts a dataset (and by default its
experiments + traces + spans) into a destination project on the same Opik
instance. Built as a thin orchestration layer over the existing
`opik export` and `opik import` flows, with `MigrationManifest`-backed
resumability and a persistent run dir at `~/.opik/copy-runs/`.

The only modification to existing import code is a new optional
`destination_project` kwarg threaded through `import_datasets_from_directory`,
`_build_dataset_item_id_map`, `recreate_experiments`,
`_import_traces_from_projects_directory`, and `import_experiments_from_directory`.
Default `None` preserves today's `opik import` behaviour exactly — the
existing 258-test suite passes unchanged.

Implements OPIK-6246: Self-serve v1 → v2 workspace migration via SDK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK labels Apr 29, 2026
Comment thread sdks/python/tests/unit/cli/test_copy_dataset.py
Comment thread sdks/python/tests/unit/cli/test_copy_dataset.py
Comment thread sdks/python/src/opik/cli/copy/dataset.py Outdated
Comment thread sdks/python/src/opik/cli/copy/dataset.py Outdated
Comment thread sdks/python/src/opik/cli/copy/dataset.py Outdated
Comment thread sdks/python/src/opik/cli/copy/dataset.py
Comment thread sdks/python/src/opik/cli/copy/dataset.py Outdated
Comment thread sdks/python/src/opik/cli/copy/__init__.py Outdated
Comment thread sdks/python/src/opik/cli/imports/dataset.py Outdated
Six fixes from PR review:

- Stable resumable run dir: replace datetime-based path with a
  fingerprint hash of (workspace, dataset, destination, source,
  exclude_experiments) so re-running the same command reuses the
  prior MigrationManifest.
- _filter_experiments_by_source_project: build a trace→project index
  from disk and keep experiments whose project can't be resolved
  (defensive default) instead of silently unlinking them.
- _verify_destination_counts: stop swallowing SDK errors into a
  generic "counts didn't match" exit; let auth/network/permission
  failures propagate so the real cause is visible.
- copy_dataset_command pre-flight: drop the broad except Exception
  catch on get_dataset; only DatasetNotFound is handled with a
  friendly message, everything else surfaces as itself.
- import_datasets_from_directory get-or-create: narrow the broad
  except to exceptions.DatasetNotFound so non-NotFound errors don't
  silently trigger a create.
- Extract the duplicated Click format_commands override into
  cli/_group_helpers.bind_items_format_commands and reuse it across
  the export/import/copy groups.

4 new unit tests guard the new behaviours; full SDK CLI suite
(279 tests) passes. Pre-commit (ruff, ruff-format, mypy) clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread sdks/python/src/opik/cli/copy/dataset.py
Comment thread sdks/python/src/opik/cli/copy/dataset.py
Comment thread sdks/python/src/opik/cli/copy/dataset.py
Comment thread sdks/python/src/opik/cli/imports/dataset.py Outdated
- Hoist the get_dataset_experiments cap to a named EXPERIMENT_LIST_CAP
  constant (10,000) used both during the source-side enumeration and
  in _verify_destination_counts. When either side hits the cap, the
  verifier now warns explicitly that the result is best-effort —
  prior behaviour silently passed when source and destination both
  saturated at the cap with potentially different real counts.
- Swap the manual get_dataset/DatasetNotFound/create_dataset dance in
  import_datasets_from_directory for client.get_or_create_dataset,
  the stable SDK API that already encapsulates this. Drops the local
  exceptions import; SDK errors (auth/network/permissions) propagate
  to the per-file error handler as themselves.

Tests updated to assert on the new SDK call shape; full suite
(279 tests) passes. Pre-commit clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JetoPistola JetoPistola added the test-environment Deploy Opik adhoc environment label Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.19-5202 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch danield/OPIK-6246-add-opik-copy-dataset-cli
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

@JetoPistola JetoPistola added test-environment Deploy Opik adhoc environment and removed test-environment Deploy Opik adhoc environment labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.19-5206 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch danield/OPIK-6246-add-opik-copy-dataset-cli
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6555) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions CometActions removed the test-environment Deploy Opik adhoc environment label May 1, 2026
@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6555) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

4 similar comments
@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6555) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6555) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6555) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6555) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants