fix(browser): validate Deep Research completion#262
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b6132b03df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1515389533
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Codex review: needs changes before merge. Reviewed June 18, 2026, 2:04 AM ET / 06:04 UTC. Summary Reproducibility: Source-reproducible: the linked issue provides concrete placeholder/status and system-reminder outputs, and current main only rejects tool-only placeholders before treating extracted text as a result. I did not run a live Deep Research session because the PR itself notes that costs credits and remains the final gate. Review metrics: 3 noteworthy metrics.
Root-cause cluster Members:
Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Mantis proof suggestion Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge
Security Review findings
Review detailsBest possible solution: Preserve fresh target reads when owner lookup is unavailable, keep stale-frame protections, then verify with a real Deep Research browser run before merge. Do we have a high-confidence way to reproduce the issue? Source-reproducible: the linked issue provides concrete placeholder/status and system-reminder outputs, and current main only rejects tool-only placeholders before treating extracted text as a result. I did not run a live Deep Research session because the PR itself notes that costs credits and remains the final gate. Is this the best way to solve the issue? No: the incomplete-result classifier is a good direction, but the PR should preserve fresh target reads when owner lookup fails before merge. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against bfa8f1de4266. Label changesLabel changes:
Label justifications:
Evidence reviewedAcceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
1515389 to
0456e27
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0456e276b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!observedResearchEvidence) { | ||
| throw new BrowserAutomationError( | ||
| "ChatGPT returned a completed response without starting Deep Research. The Deep Research selection may have silently fallen back to a normal response.", | ||
| { stage: "deep-research-not-started", code: "deep-research-not-started" }, | ||
| ); |
There was a problem hiding this comment.
Accept already-finished DOM reports on reattach
When oracle session reattaches to a Deep Research run that has already completed and ChatGPT exposes the final report only as a normal assistant DOM turn (no live OOPIF/planning panel is observed during this new polling invocation), observedResearchEvidence starts false and this guard throws before extractDeepResearchResult can read the report. reattach.ts calls this helper directly for researchMode === "deep", so users can lose successful completed runs simply because they reattached after the evidence-producing frames/status UI disappeared; consider trusting a scoped finished turn on reattach or carrying prior evidence into this check.
Useful? React with 👍 / 👎.
|
Maintainer verification complete on exact head
The compatibility review thread was addressed with fresh-target baseline scoping and regression coverage, then resolved. |
Summary
Fixes #261.
Verification
pnpm typecheckpnpm lintpnpm test— 1,309 passed, 43 skippedgpt-5.5, high) — clean; no accepted/actionable findingsCaveat
No live Deep Research credit was consumed. The regression uses the reporter's captured planning/status and system-reminder shapes; a real browser Deep Research run remains the final merge gate.