Skip to content

feat(extract): add dynamic import() extraction for JS/TS#579

Open
Yalkowni wants to merge 205 commits intosafishamsi:mainfrom
Yalkowni:feat/dynamic-import-extraction
Open

feat(extract): add dynamic import() extraction for JS/TS#579
Yalkowni wants to merge 205 commits intosafishamsi:mainfrom
Yalkowni:feat/dynamic-import-extraction

Conversation

@Yalkowni
Copy link
Copy Markdown
Contributor

What

Adds _dynamic_import_js() — a 65-line helper that detects import() call expressions in JS/TS files and emits imports_from edges from the enclosing function.

Before this change, any module loaded via dynamic import was invisible to the graph. This is a significant blind spot for codebases that use dynamic imports for code splitting, lazy loading, or container isolation (e.g. await import('./mayaEngine.js')).

How

  • Detects call_expression nodes whose function child is the import keyword (how tree-sitter-typescript represents import(...))
  • Resolves the module path using the same logic as static imports: relative path normalization, .js.ts/.jsx.tsx suffix remapping, and tsconfig path aliases
  • Emits imports_from edges with confidence: EXTRACTED (deterministic string literals)
  • Edge source is the enclosing function, not the file — so the graph reflects where the dynamic load actually happens
  • Hooked into walk_calls for JS/TS configs; continues recursing into children so import().then(...) chains still resolve their calls

Supported patterns

const { foo } = await import('./foo.js');       // .js → .ts remapping
import('./queue.js').then(m => m.bar());         // chained .then still resolves
const m = await import(`./template`);           // template literal
await import('@/utils/helpers');                 // tsconfig alias

Tests

  • tests/fixtures/dynamic_import.ts — fixture with static import, two dynamic imports in separate functions, one clean sync function
  • 5 new tests in tests/test_languages.py:
    • no error on parse
    • all three modules (static + both dynamic) produce imports_from edges
    • dynamic edges carry EXTRACTED confidence
    • edge source is the enclosing function (processInbound), not the file
    • sync function gets no spurious imports_from edges
  • All 5 pass alongside the existing 110 tests

- Add GitHub Actions CI workflow (Python 3.10 and 3.12)
- Add CI badge to README
- Add ARCHITECTURE.md: pipeline overview, module table, schema, how to
  add a language extractor, security summary
- Move eval reports from tests/ to worked/httpx/ and worked/mixed-corpus/
- Fix README: test count 163→212, language table (13 languages via
  tree-sitter), extract.py description, worked examples links

benchmark: 8.8x token reduction on nanoGPT + minGPT + micrograd

- Run AST extraction on 29 Python files across 3 Karpathy repos
- 177 nodes, 246 edges, 17 communities (Leiden)
- 8.8x avg token reduction vs naive full-corpus context stuffing
- Notable: micrograd cleanly splits into engine/nn communities;
  nanoGPT model vs training loop correctly separated
- Honest: stdlib import noise flagged, config isolates documented

benchmark: 71.5x token reduction on mixed corpus (code+papers+images)

Full run: nanoGPT+minGPT+micrograd + 5 research papers + 4 images
285 nodes, 340 edges, 53 communities
Average BFS query: 1,726 tokens vs 123,488 naive (71.5x)
Code-only (AST) sub-benchmark: 8.8x on 13k-word corpus
style: replace all em dashes with hyphens

fix: explain hidden .graphify/ folder in skill output and README

fix: rename .graphify/ to graphify-out/ so output is visible by default
- Replace pyvis with custom vis.js renderer: node size by degree,
  click-to-inspect panel with clickable neighbors, search box,
  community filter, physics clustering by community
- HTML graph generated by default on every run (no --html flag needed)
- Token reduction benchmark auto-runs after every /graphify on corpora >5k words
- Fix 292 edge warnings: silently skip stdlib/external edges in build.py
- Fix build() to merge extractions before building (cross-extraction edges were dropped)
- Add 5 HTML renderer tests (223 total)
- Remove unnecessary files: lib/, tests/eval_attention.py, misplaced eval reports
- Add graphify-out/ and .graphify_*.json to .gitignore
- Bump version to 0.1.4, remove pyvis dependency
- README: token reduction as top-level selling point, vis.js in tech stack,
  graph.html in output listing, correct test count and install command
Covers detect → extract → build → cluster → analyze → report → export
using existing fixtures. AST-only (no LLM calls), catches regressions
in how modules connect, not just individual module behaviour.
- Semantic extraction chunks: 12-15 → 20-25 files (fewer subagent round trips)
- Code-only corpora skip semantic dispatch entirely (AST covers it)
- Print estimated time before extraction so the wait feels intentional
…hecks, no-viz clarity

- Add --graphml to Usage table (was implemented but undocumented there)
- Remove early manifest save from --update merge step (Step 9 owns it; saving early meant failed pipelines left manifest ahead of graph)
- query/path/explain now check graph.json exists before running, with clear "run /graphify first" message
- --no-viz: clarify it skips both Obsidian vault and HTML (was contradictory)
…laude Code hooks

- confidence_score required on every edge (INFERRED: 0.4-0.9, EXTRACTED: 1.0, AMBIGUOUS: 0.1-0.3)
- semantically_similar_to edges for non-obvious cross-file conceptual links
- hyperedges for 3+ node group relationships - fixed cache and merge pipeline that was silently dropping them
- check_semantic_cache returns 4-tuple including cached_hyperedges
- extract.py: mine the "why" - module/class/function docstrings and rationale comments (# NOTE: # IMPORTANT: # HACK: # WHY: # RATIONALE: # TODO: # FIXME:) as rationale_for nodes
- skill.md: rationale_for in relation schema, doc files extract design rationale
- obsidian output opt-in (--obsidian flag) - default output is graph.html + graph.json + GRAPH_REPORT.md only
- hooks.py: post-checkout hook added alongside post-commit - graph rebuilds on branch switch
- claude install: writes .claude/settings.json PreToolUse hook on Glob/Grep - Claude checks graph before searching raw files
- README updated with all v2 features
safishamsi and others added 29 commits April 21, 2026 22:21
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cache: skip directory source_file in save_cached to prevent IsADirectoryError (safishamsi#444)
- report: skip structural-only communities with no real nodes (safishamsi#443)
- hooks: allow @ in python path allowlist for Homebrew paths (safishamsi#474)
- watch: keep source_file paths project-relative after rebuild (safishamsi#434)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-community gaps; add graph-query CLI rules to install sections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, readme gitignore docs

- wiki.py: add encoding="utf-8" to all write_text() calls (fixes Windows cp1252 crash safishamsi#496)
- wiki.py: deduplicate filenames with _unique_slug() to prevent silent article overwrites (safishamsi#497)
- hooks.py: skip post-commit/post-checkout during rebase/merge/cherry-pick (safishamsi#485)
- detect.py: resolve root path at detect() entry so .graphifyignore patterns match consistently (safishamsi#495)
- README.md: document manifest.json, cost.json gitignore and .graphifyignore platform file examples (safishamsi#369)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, correct common-root inference

- analyze.py: add seed=42 to betweenness_centrality() — eliminates non-deterministic GRAPH_REPORT.md diffs on graphs >1000 nodes (safishamsi#499)
- extract.py: fix common-root inference to stop at first diverging segment not sum of all matches (safishamsi#502)
- extract.py: resolve root to absolute path; post-process file node IDs to project-relative after extraction so graph.json edge endpoints are stable across machines (safishamsi#502)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… legacy schema canonicalization, Java inheritance, aggregated HTML viz, check-update subcommand
…ard, label dedup, chunk-suffix prompt block
…th, fix gitignore inline comments, nosec on write sinks
…ontrols, desync guard, rationale prompt

- safishamsi#550: _file_stem() includes parent dir to prevent node ID collisions for same-named files
- safishamsi#555: extract() relativizes source_file paths before returning for cross-machine portability
- safishamsi#562: to_json() returns bool; _rebuild_code() writes report/html only if json succeeded
- safishamsi#563: skill prompts store rationale as node attribute, not separate node; enforce calls direction
- safishamsi#566: Show All / Hide All buttons added to HTML community panel
- safishamsi#575: _import_js() resolves tsconfig.json compilerOptions.paths aliases before external fallback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds _dynamic_import_js() helper (65 lines) that detects import() call
expressions in JS/TS, resolves the module path (same logic as static
imports including .js→.ts mapping and tsconfig aliases), and emits
imports_from edges from the enclosing function. Hooked into walk_calls
for JS/TS configs.

Also adds tests/fixtures/dynamic_import.ts fixture and 5 new tests
in tests/test_languages.py (all passing alongside 110 existing tests).
import(`./handlers/${name}`) previously produced a garbage edge to a
path containing the unresolved ${name} expression. Now detects
template_substitution child nodes and breaks without emitting an edge.
Static template literals (no interpolation) still resolve correctly.

Adds 2 new tests: one asserting dynamic templates produce no edge,
one asserting static templates resolve like plain strings.
safishamsi added a commit that referenced this pull request May 2, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants