fix(build): prevent partial-chunk overwrite with build_merge() and node-count guard (#479)#494
Open
harshitanand wants to merge 1 commit intosafishamsi:mainfrom
Conversation
…warning (safishamsi#479) - build_from_json(): detect legacy 'source' field on nodes, emit a loud stderr warning with renamed-node count AND affected-edge count, then patch a copy (never mutates caller's dicts) - build_from_json() / build(): add directed= kwarg to support DiGraph output - New build_merge(): load existing graph.json, merge new extractions in, raise ValueError if merged node count drops below the existing count (force=True bypasses) - _rebuild_code() in watch.py: check graph.json node count before overwriting; raise ValueError if the new build would shrink the graph - ValueError escapes the broad except Exception block via explicit 'except ValueError: raise' - watch() loop: catch ValueError from _rebuild_code(), log to stderr, continue Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linked Issue
Closes #479
Description
--update(and multi-session incremental workflows) had a silent data-loss failure mode: callingbuild()with a partial chunk list silently replaced the existinggraph.jsonwith a smaller graph. This PR closes that failure class with four changes:1.
build_merge()helper (build.py)A new
build_merge(new_extractions, existing_graph_path, *, force=False)function that:ValueErrorif the merged result would be smaller than the existing graph (bypass withforce=True)2. Node-count safety check in
_rebuild_code()(watch.py)Before writing
graph.json,_rebuild_code()now checks that the new graph has at least as many nodes as the existing one. If not, raisesValueErrorwith a clear actionable message:3. Louder field-name mismatch warning (
build.py)When nodes use the legacy
"source"field instead of"source_file", the warning now reports the renamed node count and the number of affected edges.4. Watch loop crash fix (
watch.py)The node-count
ValueErroris caught in thewatch()event loop — logs and continues rather than crashing the watcher process.Type of Change
build_merge()is additive, guard only triggers on shrinkage)Breaking Changes
N/A for normal usage. Users relying on
--updateoverwriting with a smaller graph would needforce=True.Checklist