Use this checklist when you want to verify the real gateway path in a network-enabled environment.
This covers four layers:
- Free smoke: startup, simple response, a basic tool call, and session cost reporting.
- Core happy-path: write/read/glob/grep/multi-tool and multi-turn session accounting.
- Weak-model polish: output cleanup on a live low-cost model.
- Paid tools: ExaSearch, ExaAnswer, ExaReadUrls, and VideoGen.
Before running live e2e, make sure all of these are true:
- Node.js is 20 or newer.
- Dependencies are installed with
npm install. - The project builds locally with
npm run build. - Deterministic local tests pass with
npm test. - The machine has outbound network access to the BlockRun gateway.
- For paid-tool coverage, the wallet is funded and usable.
Useful checks:
npm run build
npm test
node dist/index.js --help
node dist/index.js balanceE2E_MODEL=<provider/model>overrides the default live model. If unset, e2e defaults tozai/glm-5.1.RUN_PAID_E2E=1enables the paid-tool tests. Without it, paid tests are skipped on purpose.FRANKLIN_MODEL_REQUEST_TIMEOUT_MScontrols how long Franklin waits for the initial model response headers.FRANKLIN_MODEL_STREAM_IDLE_TIMEOUT_MScontrols how long Franklin waits for the next streamed chunk after the response has started.
For normal validation, leave the timeout env vars alone. They are mainly for debugging slow or flaky networks.
Run the live suite in this order so failures are easy to localize.
This is the fastest signal that the CLI starts, the gateway is reachable, the default live model answers, and the basic session summary still works.
node --test --test-reporter=spec \
--test-name-pattern='startup|simple response|bash tool: executes shell command and returns output|session cost: token usage reported at session end' \
test/e2e.mjsExpected result in a healthy network-enabled environment:
startuppasses immediately.simple responsepasses.bash toolpasses.session costpasses.
If these skip with Live gateway/network unavailable in this environment, treat that as an environment problem, not a product pass.
Once smoke passes, verify the main tool and session paths.
node --test --test-reporter=spec \
--test-name-pattern='write tool: creates a file with specified content|read tool: reads a pre-existing file|glob tool: finds files by pattern|grep tool: finds content in files|bash tool: error exit code is captured|multi-tool: write then read a file in same session|session cost: accumulates across multiple turns|session cost: /cost command shows cost info|polish: weak model respects instruction without leaking <think> or \\[TOOLCALL\\]' \
test/e2e.mjsExpected result:
- File tools pass on a real temp directory.
bash tool: error exit code is capturedstill exits the CLI cleanly./costand multi-turn accounting both pass.- The weak-model polish probe returns
POLISH_PROBE_OKwithout leaking<think>or[TOOLCALL].
Run this only after free/core are clean and the wallet has funds.
RUN_PAID_E2E=1 node --test --test-reporter=spec \
--test-name-pattern='ExaSearch tool|ExaAnswer tool|ExaReadUrls tool|VideoGen tool' \
test/e2e.mjsExpected result:
- ExaSearch shows a visible
ExaSearchcall and at least one URL. - ExaAnswer shows a visible
ExaAnswercall and a grounded answer mentioningx402,payment, orHTTP 402. - ExaReadUrls shows a visible
ExaReadUrlscall and mentionsHTTP 402or payment. - VideoGen creates a non-trivial MP4 at the requested output path.
After the focused runs are green, run the full live suite as the final check.
RUN_PAID_E2E=1 npm run test:e2eIf you only want the unfunded/free live suite, omit RUN_PAID_E2E=1.
Use the first recognizable failure signature to decide where to look next.
-
Live gateway/network unavailable in this environment- The machine could not reach the live gateway, or the request timed out before headers/stream data arrived.
- Check outbound network access first.
-
Model unavailable due to payment/balance constraints- The selected model or tool path needs funds, or payment verification failed.
- Check wallet balance and try again with a funded wallet or a cheaper/free
E2E_MODEL.
-
Free tier rate limited (60 req/hr)- The free model path is exhausted for now.
- Retry later or switch
E2E_MODELto another model you intend to validate.
-
A harness-level timeout with no skip
- This is more suspicious.
- It can mean a regression in request timeout handling, stream idle handling, or a CLI code path that no longer exits cleanly.
-
Free smoke passes but write/read/glob/grep fails
- The gateway is likely fine.
- Focus on local tool execution, file-path handling, or prompt/tool orchestration.
-
Tool tests pass but session cost tests fail
- Focus on stderr summaries, token accounting, or
/costcommand rendering.
- Focus on stderr summaries, token accounting, or
-
Paid Exa tests fail but free/core passes
- Focus on x402 payment flow, wallet funding, or the paid-tool integration layer rather than the base CLI loop.
Treat the run as truly green only when:
- Free smoke passes without skipping.
- Core happy-path passes without skipping.
- Paid-tool tests pass when
RUN_PAID_E2E=1is enabled on a funded wallet. - No test spends a long time hanging before failing or skipping.
Fast skip is acceptable in a network-restricted environment. It is not evidence that the live happy-path works.