Skip to content

fix(build): prevent partial-chunk overwrite with build_merge() and node-count guard (#479)#494

Open
harshitanand wants to merge 1 commit intosafishamsi:mainfrom
harshitanand:fix/issue-479-build-merge-node-count-guard
Open

fix(build): prevent partial-chunk overwrite with build_merge() and node-count guard (#479)#494
harshitanand wants to merge 1 commit intosafishamsi:mainfrom
harshitanand:fix/issue-479-build-merge-node-count-guard

Conversation

@harshitanand
Copy link
Copy Markdown

Linked Issue

Closes #479

Description

--update (and multi-session incremental workflows) had a silent data-loss failure mode: calling build() with a partial chunk list silently replaced the existing graph.json with a smaller graph. This PR closes that failure class with four changes:

1. build_merge() helper (build.py)

A new build_merge(new_extractions, existing_graph_path, *, force=False) function that:

  • Loads the existing on-disk graph
  • Merges new extractions via NetworkX union semantics (only grows — never replaces)
  • Raises ValueError if the merged result would be smaller than the existing graph (bypass with force=True)

2. Node-count safety check in _rebuild_code() (watch.py)

Before writing graph.json, _rebuild_code() now checks that the new graph has at least as many nodes as the existing one. If not, raises ValueError with a clear actionable message:

[graphify] Refusing to write graph.json: new graph has 55 nodes but existing has 123.
You may be missing chunk files from a previous session. Pass force=True to override.

3. Louder field-name mismatch warning (build.py)

When nodes use the legacy "source" field instead of "source_file", the warning now reports the renamed node count and the number of affected edges.

4. Watch loop crash fix (watch.py)

The node-count ValueError is caught in the watch() event loop — logs and continues rather than crashing the watcher process.

Type of Change

  • Bug fix (non-breaking — build_merge() is additive, guard only triggers on shrinkage)

Breaking Changes

N/A for normal usage. Users relying on --update overwriting with a smaller graph would need force=True.

Checklist

  • Code follows project style guidelines
  • Self-review performed
  • All existing tests pass locally

…warning (safishamsi#479)

- build_from_json(): detect legacy 'source' field on nodes, emit a loud stderr
  warning with renamed-node count AND affected-edge count, then patch a copy
  (never mutates caller's dicts)
- build_from_json() / build(): add directed= kwarg to support DiGraph output
- New build_merge(): load existing graph.json, merge new extractions in, raise
  ValueError if merged node count drops below the existing count (force=True bypasses)
- _rebuild_code() in watch.py: check graph.json node count before overwriting;
  raise ValueError if the new build would shrink the graph
- ValueError escapes the broad except Exception block via explicit 'except ValueError: raise'
- watch() loop: catch ValueError from _rebuild_code(), log to stderr, continue

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant