Skip to content

[BUG] GraphWorkflow checkpoints can collide across workflows and loops #1537

@shaun0927

Description

@shaun0927

Describe the bug
GraphWorkflow checkpoint filenames are currently derived from the task hash and layer index only. When multiple workflows share the same checkpoint_dir and task string, they can reuse each other's checkpoint files.

The same naming scheme also ignores the loop index, so multi-loop runs can pick up stale data from a previous loop.

To Reproduce

  1. Create two different GraphWorkflow instances that share the same checkpoint_dir.
  2. Run the first workflow on task "shared task" so it writes checkpoints.
  3. Run the second workflow on the same task string.
  4. The second workflow can load the first workflow's checkpoint files because the filename namespace is the same.

At HEAD the filename pattern is effectively:

f"{sha256(task)[:16]}_layer_{layer_idx}.json"

so workflow identity and loop identity are both missing from the checkpoint path.

Expected behavior
Checkpoint filenames should include enough stable workflow identity (for example topology/name namespace) and loop index to avoid cross-workflow or cross-loop collisions while still allowing resume across restarts for the same logical workflow.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions