feat: add support for multi url llm extraction by RohitR311 · Pull Request #1117 · getmaxun/maxun

RohitR311 · 2026-06-20T04:45:56Z

What this PR does?

Feat:

Add support for multi-URL AI robot extraction. Based on prompt decides whether to navigate to multiple sites in order to extract data.

Fix:

For the AI powered robot creation the LLM selects a group which results in actual data being sampled. However this sample was never checked against the prompt post creation on the off chance that the data does not align with the prompt. This PR verifies the sample data and ensures a second time around the data aligns the prompt and then goes ahead and tries again if it does not.

Summary by CodeRabbit

New Features
- Multi-site workflow automation: Create workflows spanning multiple domains in one request, including intelligent multi-result URL selection and stitched sequential execution.
- Enhanced data extraction verification: Add an LLM-based check to confirm extracted list data matches your intent, with confidence-gated fallback when verification fails.
Bug Fixes
- Improved workflow generation reliability for multi-site prompts, ensuring the correct target URL is carried through the workflow creation flow.

coderabbitai · 2026-06-20T04:46:07Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Adds multi-site research-mode workflow generation to WorkflowEnricher, plus an LLM-based verification step for single-site list extraction. The POST /recordings/llm route now detects multi-site prompts and routes them through the new search-and-stitch workflow path.

Changes

Multi-site workflow generation and LLM verification

Layer / File(s)	Summary
LLM verification step for list extraction `server/src/sdk/workflowEnricher.ts`	Adds private `verifyWorkflowOutput` and invokes it from `buildWorkflowFromLLMDecision`; throws `WorkflowVerificationFailed` when verification rejects the extracted sample.
Multi-site search, URL selection, stitching, and public entry point `server/src/sdk/workflowEnricher.ts`	Adds `performDuckDuckGoMultiSearch`, `isMultiSitePrompt`, `selectMultipleUrlsFromResults`, `buildMultiSiteWorkflow`, and `generateMultiSiteWorkflowFromSearch` to build and combine per-site workflows.
Route handler branching on multi-site prompt `server/src/routes/storage.ts`	Updates the no-url `POST /recordings/llm` flow to detect multi-site prompts, call the new multi-site generator, and keep the existing single-site search path otherwise.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant StorageRoute as POST /recordings/llm
  participant WorkflowEnricher
  participant DuckDuckGo
  participant LLMProvider

  Client->>StorageRoute: prompt (no url)
  StorageRoute->>WorkflowEnricher: isMultiSitePrompt(prompt)
  WorkflowEnricher->>LLMProvider: classify prompt
  LLMProvider-->>WorkflowEnricher: true / false
  WorkflowEnricher-->>StorageRoute: isMultiSite = true

  rect rgba(100, 150, 255, 0.5)
    note over StorageRoute,LLMProvider: Multi-site path
    StorageRoute->>WorkflowEnricher: generateMultiSiteWorkflowFromSearch(prompt, userId)
    WorkflowEnricher->>DuckDuckGo: multi-query search
    DuckDuckGo-->>WorkflowEnricher: multi-domain results
    WorkflowEnricher->>LLMProvider: select up to N domain-distinct URLs
    LLMProvider-->>WorkflowEnricher: selected URLs
    WorkflowEnricher->>WorkflowEnricher: generate per-site workflows + verify each
    WorkflowEnricher->>WorkflowEnricher: stitch into single multi-site workflow
    WorkflowEnricher-->>StorageRoute: {success, workflow, url}
  end

  StorageRoute-->>Client: combined workflow + finalUrl

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

getmaxun/maxun#946: Introduced the original search-based workflow generation path (generateWorkflowFromPromptWithSearch) that this PR extends with multi-site classification and a new parallel entry point.
getmaxun/maxun#957: Modified the same no-url control flow in POST /recordings/llm and workflowEnricher.ts, directly adjacent to where the multi-site branch is now inserted.
getmaxun/maxun#921: Also touches the LLM workflow-generation route and WorkflowEnricher path for prompt-driven recording setup.

Suggested reviewers

amhsirak

Poem

🐇 I hop through sites with curious nose,
The LLM whispers where the workflow grows.
One prompt, many domains, stitched just so,
And bunny-bright results begin to flow.
🎋

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main change: adding multi-URL/multi-site LLM extraction support.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch multi-site

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

server/src/sdk/workflowEnricher.ts (1)

1633-1642: ⚡ Quick win

Add request timeouts to axios LLM calls to prevent indefinite hangs.

The axios POST requests to Ollama (line 1633) and OpenAI (line 1667) lack explicit timeouts. If the LLM service is slow or unresponsive, the verification step could block indefinitely, tying up browser resources and degrading the user experience.

🔧 Suggested fix: add timeout to axios calls

         const response = await axios.post(`${ollamaBaseUrl}/api/chat`, {
           model: ollamaModel,
           messages: [
             { role: 'system', content: systemPrompt },
             { role: 'user', content: userMessage }
           ],
           stream: false,
           format: jsonSchema,
           options: { temperature: 0.1 }
-        });
+        }, { timeout: 60000 });

Apply similar change to the OpenAI call at line 1667.

Also applies to: 1667-1681

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/sdk/workflowEnricher.ts` around lines 1633 - 1642, The axios POST
requests to Ollama and OpenAI LLM services lack explicit timeout configurations,
which can cause the verification step to hang indefinitely if the service is
unresponsive. Add a timeout property (in milliseconds) to the configuration
objects of both axios.post calls - specifically to the Ollama API call at the
location where the model, messages, stream, format, and options are configured,
and apply the same timeout configuration to the OpenAI API call. Choose a
reasonable timeout value that prevents indefinite blocking while allowing
sufficient time for typical LLM responses.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/src/sdk/workflowEnricher.ts`:
- Around line 1633-1642: The axios POST requests to Ollama and OpenAI LLM
services lack explicit timeout configurations, which can cause the verification
step to hang indefinitely if the service is unresponsive. Add a timeout property
(in milliseconds) to the configuration objects of both axios.post calls -
specifically to the Ollama API call at the location where the model, messages,
stream, format, and options are configured, and apply the same timeout
configuration to the OpenAI API call. Choose a reasonable timeout value that
prevents indefinite blocking while allowing sufficient time for typical LLM
responses.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70a42c71-cf74-4894-9ed4-0b18ff31b680

📥 Commits

Reviewing files that changed from the base of the PR and between d581502 and 5c824bc.

📒 Files selected for processing (2)

server/src/routes/storage.ts
server/src/sdk/workflowEnricher.ts

feat: add support for multi url llm extraction

5c824bc

RohitR311 added Type: Bug Something isn't working Type: Enhancement Improvements to existing features labels Jun 20, 2026

RohitR311 requested a review from amhsirak June 20, 2026 04:46

RohitR311 added Type: Feature New features and removed Type: Enhancement Improvements to existing features labels Jun 20, 2026

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

amhsirak approved these changes Jun 24, 2026

View reviewed changes

Merge branch 'develop' into multi-site

3c40f8a

amhsirak merged commit 689772f into develop Jun 24, 2026
1 check was pending

coderabbitai Bot mentioned this pull request Jun 24, 2026

chore: pre-release v0.0.43 #1125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for multi url llm extraction#1117

feat: add support for multi url llm extraction#1117
amhsirak merged 2 commits into
developfrom
multi-site

RohitR311 commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Review failed

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RohitR311 commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RohitR311 commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading