This document turns Heddle's broad framework goal into a concrete execution path.
The immediate proving ground is not "general agents" in the abstract. It is a conversational coding agent runtime that becomes useful enough to help build Heddle itself.
Coding is the first hard reference workload because it demands most of the host-framework capabilities that matter elsewhere:
- tool use
- environment access
- safety boundaries
- approval flows
- traceability
- memory and context handling
- verification
- eventually delegation
If Heddle becomes credible for coding work, it becomes much easier to reason about how to adapt the same host architecture to other domains later.
Heddle should become a conversational coding agent runtime that is good enough to help build Heddle itself.
This does not require competing with existing commercial products feature-for-feature. It does require building a system that is actually useful in real repository work.
Goal:
- move from one-shot prompt execution to a conversational terminal interface
Scope:
- terminal UI using Ink
- visible multi-turn conversation
- per-turn tool activity visibility
- session state that carries prior turns into the next run
- basic trace persistence for each turn
Out of scope:
- streaming token rendering
- patch approval workflows
- file editing UX
- subagents
Exit criteria:
- a user can open Heddle in the terminal, ask multiple questions in sequence, and keep the conversation grounded across turns
Goal:
- make Heddle feel like a usable coding assistant instead of a repo Q&A demo
Scope:
- stronger shell and file-action surface
- explicit risky-action approval boundaries
- git-aware repo context
- clearer progress and trace rendering
- interrupt and continue semantics
- better verification-first behavior after edits
Exit criteria:
- Heddle can inspect, edit, verify, and explain bounded changes in a real repository with a usable operator experience
Current progress:
- conversational terminal is working and usable enough for short coding-agent sessions
- shell capability is split into
run_shell_inspectandrun_shell_mutate - file creation and editing now have a first-class
edit_filetool instead of relying only on shell-based file-writing workarounds run_shell_mutateis approval-gated in chat mode- shell tools now classify allowed commands by bounded workspace/inspect policy rules and return scope/risk/capability metadata
- unclassified mutate commands now fall back to explicit approval with
unknownrisk metadata instead of immediate rejection - known external CLIs such as
gh,aws, andkubectlnow classify under explicit external-system scope instead of looking like generic workspace commands - workspace-changing mutate commands trigger host-side pressure to inspect repo state with concrete git evidence and run verification before final answer
- workspace-changing mutate runs now also require a short operator-style final answer with explicit
Changed,Verified, andRemaining uncertaintysections, naming the exact review and verification commands used - chat mode now supports interrupt via
Escand resume via/continue - carried-over session history is sanitized before the next run so interrupted tool calls do not poison later turns with missing tool-output API errors
- approval prompts now surface scope/capability/risk metadata, and exact per-project approvals apply immediately after being remembered
- project approval memory is now backward-compatible with older saved rules, supports low-risk workspace verification command families such as
yarn test ..., and can rememberedit_fileat the project level - the chat view is now more stable for multi-turn use: conversation stays the anchor, active run state is rendered inline with that flow, and basic response formatting now makes lists/code easier to read
- the local control plane now has a workstation-style browser shell instead of only a read-only dashboard: sessions render as sidebar + conversation + review inspector, heartbeat renders as task list + detail + run history, and the web client is owned by
src/web-v2 - the control-plane server now exposes session detail and trace-derived turn review procedures over tRPC so the browser can inspect full saved session messages, turn summaries, and review evidence without loading only list projections
- core runtime modules have been reorganized under
src/core, including agent loop, runtime heartbeat, llm, tools, prompts, trace, and shared utils, so the public surface can keep growing without the old flatsrc/layout - the heartbeat/control-plane path is now cleaner at the package surface: CLI heartbeat imports from the package entrypoint, package scripts expose
server:dev,client:dev, andclient:build, and example imports follow the new core layout
Remaining priority:
- stronger git-native review flow after changes
- better use of concrete diff/status evidence in those summaries
- begin evolving shell policy from a narrow command-prefix allowlist toward a real execution-policy model based on risk, scope, approval, and auditability
- reduce reliance on shell-syntax blocking as a safety mechanism; serious workflows will need broader command expressiveness than the current heredoc/redirect restrictions allow
- strengthen host-side follow-through so when the agent discovers a safe path for a bounded change, it executes it instead of stopping at explanation
- finish wiring the browser workstation shell into real
send messageandcontinuesession mutations so the web control plane can drive chat sessions directly instead of primarily inspecting them
Current concrete next step:
- finish the "self-closing coding loop" for Phase 1
- make the conversation an append-only journal of reasoning, file edits, plan updates, verification, and final outcome instead of a surface that is rebuilt or reordered after the run
- prevent premature plan completion when post-mutation review or verification is still pending
- guarantee that every non-successful run still leaves a visible terminal summary in the conversation, not only a status badge
- use this as the main acceptance bar for asking Heddle to implement the next bounded product improvement itself
Direction for shell evolution:
- the current allowlist-based
run_shell_mutateis a bootstrap, not the intended end state - serious usefulness requires a bounded general execution surface, not an ever-growing list of specific commands
- future shell policy should classify actions by risk and scope rather than by enumerating all allowed CLIs
Target direction:
- keep
run_shell_inspectas the low-risk evidence-gathering surface - evolve mutation/execution into a host-governed policy surface that can eventually support real commands such as project-local scripts, file operations,
aws,kubectl,gh, or similar tools when the current environment allows them - make decisions based on:
- workspace scope
- external-system scope
- destructive risk
- approval requirement
- trace/audit requirements
Near-term implication:
- the next shell work should not be "add more prefixes forever"
- it should move toward capability classes and host-side execution policy
- early steps in that direction are now live: mutate policy can classify
yarn run ...as a project-script capability, and known external CLIs now classify under explicit external-system scope - but Phase 1 should not chase broader execution until the agent can reliably finish a bounded coding task and leave behind a trustworthy journal of what happened
Goal:
- make longer interactive sessions stay coherent and recoverable
Scope:
- memory and context compaction
- better blocked-state handling
- improved evidence routing across repo docs and notes
- stronger eval workflow for conversational tasks
- better recovery from tool misuse and ambiguous prompts
Exit criteria:
- multi-turn tool-heavy sessions remain legible and useful instead of degrading quickly
Goal:
- reach the first credible "Claude Code class" workflow shape
Scope:
- background or queued runs
- longer-running command handling
- change summaries and diff-aware review support
- test-failure digestion
- resumable sessions
- richer approval and audit model
- broader policy-based execution surfaces beyond the current narrow mutate allowlist
Exit criteria:
- Heddle is useful as a regular interface for bounded coding work, not only as an experiment harness
Goal:
- add subagents only after the single-agent workflow is already useful
Scope:
- handoffs or agents-as-tools
- bounded child-agent task scoping
- parent and child trace integration
- delegation policy
- user-facing visibility into delegated work
Why this is still high effort even with SDK support:
- provider SDKs can expose the primitive
- Heddle still has to decide when to delegate, how to scope the child, what tools it gets, how results are integrated, and how the user understands what happened
Exit criteria:
- delegation improves real tasks instead of only increasing complexity
Goal:
- make Heddle the main interface for evolving the Heddle repo
This is not a single implementation milestone. It is the point where the earlier phases are reliable enough that Heddle becomes part of its own development loop.
The likely useful abstractions are practical, not philosophical:
- session or run model
- environment adapter
- capability classes
- approval policy
- trace and event model
- memory surfaces
- delegation boundary
Heddle should avoid inventing larger cognitive ontologies unless repeated failures justify them.