Skip to content

fix(browser): validate Deep Research completion#262

Merged
steipete merged 3 commits into
mainfrom
codex/issue-261-deep-research-final-report
Jun 19, 2026
Merged

fix(browser): validate Deep Research completion#262
steipete merged 3 commits into
mainfrom
codex/issue-261-deep-research-final-report

Conversation

@steipete

@steipete steipete commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • reject Deep Research planning panels, system reminders, and tool placeholders before treating them as final reports
  • require current-run research evidence before accepting a finished main-DOM response, so silent normal-response fallback fails clearly
  • apply the same incomplete-result classifier to iframe reads, extraction, and report artifact saving
  • scope target and in-page research frames to their owning post-submission conversation turn, skipping retained stale frames

Fixes #261.

Verification

  • affected Deep Research, reattach, and artifact suites — 83 passed
  • pnpm typecheck
  • pnpm lint
  • pnpm test — 1,309 passed, 43 skipped
  • Auto Review (gpt-5.5, high) — clean; no accepted/actionable findings

Caveat

No live Deep Research credit was consumed. The regression uses the reporter's captured planning/status and system-reminder shapes; a real browser Deep Research run remains the final merge gate.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6132b03df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/actions/deepResearch.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1515389533

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/actions/deepResearch.ts Outdated
@clawsweeper

clawsweeper Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed June 18, 2026, 2:04 AM ET / 06:04 UTC.

Summary
The PR updates browser Deep Research completion detection, incomplete-result filtering, report artifact saving, tests, and an unreleased changelog note.

Reproducibility: Source-reproducible: the linked issue provides concrete placeholder/status and system-reminder outputs, and current main only rejects tool-only placeholders before treating extracted text as a result. I did not run a live Deep Research session because the PR itself notes that costs credits and remains the final gate.

Review metrics: 3 noteworthy metrics.

  • PR surface: 6 files changed, +584/-84. The diff is concentrated in browser Deep Research logic and regression coverage, so review should focus on that runtime path.
  • Review threads: 1 unresolved, 1 resolved. The remaining unresolved thread is on the target-owner compatibility concern introduced by the PR head.
  • CI status: 4 checks succeeded. CI supports the regression and type/lint signal, but it does not prove the live Deep Research browser path.

Root-cause cluster
Relationship: fixed_by_candidate
Canonical: #261
Summary: This PR is the candidate fix for the linked Deep Research placeholder/status report.

Members:

Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🌊 off-meta tidepool
Patch quality: 🦐 gold shrimp
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Restore a fallback for fresh target reads when owner lookup is unavailable.

Mantis proof suggestion
A real browser Deep Research run would materially prove the user-visible automation path that mocks and CI cannot cover. A maintainer can ask Mantis to capture proof by posting this exact PR comment:

@openclaw-mantis visual task: verify a ChatGPT Deep Research browser run returns the final report and does not save planning/status or normal fallback text.

Risk before merge

  • [P2] The PR changes completion gating for a long-running browser Deep Research path, and a target-only successful report can become a timeout if owner lookup cannot resolve.
  • [P1] The PR body says no live Deep Research credit was consumed; AGENTS.md treats live OpenAI/browser smoke tests as opt-in, so runtime proof remains a maintainer merge gate.

Maintainer options:

  1. Keep target-key fallback (recommended)
    Preserve fresh-run target reads when owner lookup fails, using captured ignored target keys or the prior requireScopedTargetOwner behavior for runs that do not need owner scoping.
  2. Prove the live browser path
    After the compatibility fix, run or attach redacted live evidence from a real ChatGPT Deep Research browser session showing final-report acceptance and placeholder/status rejection.
  3. Accept owner-lookup dependence
    Maintainers can intentionally require owner lookup for all scoped target reads, but that should be an explicit runtime-availability choice because unresolved ownership can turn a successful report into a timeout.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Preserve fresh Deep Research OOPIF target reads when owner lookup fails by passing minTurnLiteral only when owner scoping is required or by falling back to the target-key filter; add regression coverage for a completed target-only report with unresolved DOM.getFrameOwner.

Next step before merge

  • [P2] A narrow code/test repair can preserve fresh target reads when owner lookup fails; live Deep Research proof remains a separate maintainer merge gate after that repair.

Security
Cleared: The diff changes browser DOM/CDP extraction logic, artifact filtering, tests, and changelog text, with no new dependencies, workflows, secret handling, or executable supply-chain surface.

Review findings

  • [P2] Preserve target reads when owner lookup fails — src/browser/actions/deepResearch.ts:226-228
Review details

Best possible solution:

Preserve fresh target reads when owner lookup is unavailable, keep stale-frame protections, then verify with a real Deep Research browser run before merge.

Do we have a high-confidence way to reproduce the issue?

Source-reproducible: the linked issue provides concrete placeholder/status and system-reminder outputs, and current main only rejects tool-only placeholders before treating extracted text as a result. I did not run a live Deep Research session because the PR itself notes that costs credits and remains the final gate.

Is this the best way to solve the issue?

No: the incomplete-result classifier is a good direction, but the PR should preserve fresh target reads when owner lookup fails before merge.

Full review comments:

  • [P2] Preserve target reads when owner lookup fails — src/browser/actions/deepResearch.ts:226-228
    Fresh browser submissions identify current Deep Research targets with ignoredTargetKeys, but this now passes minTurnLiteral into the target scanner for every scoped run. If DOM.getFrameOwner or the conversation-turn lookup returns null for a completed OOPIF target, the scanner drops the only readable report, the in-page frame path cannot recover target-only content, and the run can wait until timeout.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against bfa8f1de4266.

Label changes

Label changes:

  • add P2: The PR addresses a real browser-mode Deep Research correctness issue with limited blast radius but meaningful user cost.
  • add merge-risk: 🚨 availability: The changed target-read gating can make a successful target-only Deep Research report wait until timeout when owner lookup fails.
  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: This is an owner-authored PR, so the external-contributor proof gate does not apply; the PR body still correctly names a live Deep Research browser run as the final maintainer merge gate.

Label justifications:

  • P2: The PR addresses a real browser-mode Deep Research correctness issue with limited blast radius but meaningful user cost.
  • merge-risk: 🚨 availability: The changed target-read gating can make a successful target-only Deep Research report wait until timeout when owner lookup fails.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: This is an owner-authored PR, so the external-contributor proof gate does not apply; the PR body still correctly names a live Deep Research browser run as the final maintainer merge gate.
Evidence reviewed

Acceptance criteria:

  • [P1] pnpm vitest run tests/browser/deepResearch.test.ts tests/browser/artifacts.test.ts.
  • [P1] pnpm typecheck.
  • [P1] pnpm lint.

What I checked:

  • Fresh browser submission passes ignored target keys only: The main browser run captures pre-submission Deep Research target keys, then calls waitForDeepResearchCompletion with { ignoredTargetKeys: deepResearchTargetKeys } and no requireScopedTargetOwner. (src/browser/index.ts:1536, bfa8f1de4266)
  • PR head always enables target owner filtering for scoped runs: At PR head, rawTargetResult calls readDeepResearchTargetResult(client, ignoredTargetKeys, minTurnLiteral), which enables owner-turn filtering for every scoped run. (src/browser/actions/deepResearch.ts:226, 1515389533e3)
  • Current main only required target owner scoping when requested: Current main passes requireScopedTargetOwner ? minTurnLiteral : -1 to readDeepResearchTargetResult, preserving fresh target reads that are already scoped by ignored target keys. (src/browser/actions/deepResearch.ts:224, bc66fad88396)
  • Current main target reader skips unresolved owners when minTurnIndex is provided: The target scanner calls readDeepResearchTargetOwnerTurnIndex only when minTurnIndex >= 0 and skips the target when ownerTurnIndex is null or older than the minimum. (src/browser/actions/deepResearch.ts:531, bc66fad88396)
  • Unresolved review thread matches PR head: GitHub reviewThreads reports the line-228 target-owner review thread as unresolved; the earlier short-iframe thread is resolved by the second commit. (src/browser/actions/deepResearch.ts:228, 1515389533e3)
  • Feature history points to the Deep Research owners: GitHub path history shows Deep Research was introduced by steipete and later OOPIF/page-scoping fixes were made by umutkeltek on the same file. (src/browser/actions/deepResearch.ts, dff95f2a99b5)

Likely related people:

  • steipete: GitHub path history shows this user introduced ChatGPT Deep Research mode and later scoped Deep Research reports to current runs on the same file. (role: feature owner and recent area contributor; confidence: high; commits: dff95f2a99b5, a3c7d39d12c3, 9097283f0f46; files: src/browser/actions/deepResearch.ts, src/browser/index.ts)
  • umutkeltek: GitHub path history shows multiple merged fixes around OOPIF target extraction, page-scoped target discovery, target preference, and read selection in the same Deep Research code. (role: adjacent Deep Research extraction contributor; confidence: high; commits: e907c167bc8d, afacccf40795, b8ae5a873995; files: src/browser/actions/deepResearch.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. labels Jun 18, 2026
@steipete steipete force-pushed the codex/issue-261-deep-research-final-report branch from 1515389 to 0456e27 Compare June 19, 2026 10:16

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0456e276b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +273 to +277
if (!observedResearchEvidence) {
throw new BrowserAutomationError(
"ChatGPT returned a completed response without starting Deep Research. The Deep Research selection may have silently fallen back to a normal response.",
{ stage: "deep-research-not-started", code: "deep-research-not-started" },
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept already-finished DOM reports on reattach

When oracle session reattaches to a Deep Research run that has already completed and ChatGPT exposes the final report only as a normal assistant DOM turn (no live OOPIF/planning panel is observed during this new polling invocation), observedResearchEvidence starts false and this guard throws before extractDeepResearchResult can read the report. reattach.ts calls this helper directly for researchMode === "deep", so users can lose successful completed runs simply because they reattached after the evidence-producing frames/status UI disappeared; consider trusting a scoped finished turn on reattach or carrying prior evidence into this check.

Useful? React with 👍 / 👎.

@steipete

Copy link
Copy Markdown
Owner Author

Maintainer verification complete on exact head 0456e276b7238b9d9c3562d6de7f50ce5a315968:

  • Focused browser regression suite: 76/76 passed.
  • Full suite: 1,319 passed, 43 skipped; lint, typecheck, and build passed.
  • Authenticated ChatGPT Deep Research E2E completed in 185s and returned the final report marker ORACLE_DEEP_RESEARCH_FINAL_OK_2, including the requested release table and official source links. The saved report artifact was 1,965 bytes; planning/status text was not accepted as the result.
  • Autoreview: clean on the final uncommitted patch and the full branch diff; no accepted/actionable findings.
  • Exact-head CI run 27819767058: Ubuntu, macOS, Windows, and GitGuardian all green.

The compatibility review thread was addressed with fresh-target baseline scoping and regression coverage, then resolved.

@steipete steipete merged commit 4b7f5d0 into main Jun 19, 2026
4 checks passed
@steipete steipete deleted the codex/issue-261-deep-research-final-report branch June 19, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal priority bug or improvement with limited blast radius. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ChatGPT 5.5-Pro Deep Research sessions intermittently return status/placeholder content instead of final research report

1 participant