Skip to content

feat: add support for multi url llm extraction#1117

Merged
amhsirak merged 2 commits into
developfrom
multi-site
Jun 24, 2026
Merged

feat: add support for multi url llm extraction#1117
amhsirak merged 2 commits into
developfrom
multi-site

Conversation

@RohitR311

@RohitR311 RohitR311 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

What this PR does?

Feat:

Add support for multi-URL AI robot extraction. Based on prompt decides whether to navigate to multiple sites in order to extract data.

Fix:

For the AI powered robot creation the LLM selects a group which results in actual data being sampled. However this sample was never checked against the prompt post creation on the off chance that the data does not align with the prompt. This PR verifies the sample data and ensures a second time around the data aligns the prompt and then goes ahead and tries again if it does not.

Summary by CodeRabbit

  • New Features
    • Multi-site workflow automation: Create workflows spanning multiple domains in one request, including intelligent multi-result URL selection and stitched sequential execution.
    • Enhanced data extraction verification: Add an LLM-based check to confirm extracted list data matches your intent, with confidence-gated fallback when verification fails.
  • Bug Fixes
    • Improved workflow generation reliability for multi-site prompts, ensuring the correct target URL is carried through the workflow creation flow.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Adds multi-site research-mode workflow generation to WorkflowEnricher, plus an LLM-based verification step for single-site list extraction. The POST /recordings/llm route now detects multi-site prompts and routes them through the new search-and-stitch workflow path.

Changes

Multi-site workflow generation and LLM verification

Layer / File(s) Summary
LLM verification step for list extraction
server/src/sdk/workflowEnricher.ts
Adds private verifyWorkflowOutput and invokes it from buildWorkflowFromLLMDecision; throws WorkflowVerificationFailed when verification rejects the extracted sample.
Multi-site search, URL selection, stitching, and public entry point
server/src/sdk/workflowEnricher.ts
Adds performDuckDuckGoMultiSearch, isMultiSitePrompt, selectMultipleUrlsFromResults, buildMultiSiteWorkflow, and generateMultiSiteWorkflowFromSearch to build and combine per-site workflows.
Route handler branching on multi-site prompt
server/src/routes/storage.ts
Updates the no-url POST /recordings/llm flow to detect multi-site prompts, call the new multi-site generator, and keep the existing single-site search path otherwise.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant StorageRoute as POST /recordings/llm
  participant WorkflowEnricher
  participant DuckDuckGo
  participant LLMProvider

  Client->>StorageRoute: prompt (no url)
  StorageRoute->>WorkflowEnricher: isMultiSitePrompt(prompt)
  WorkflowEnricher->>LLMProvider: classify prompt
  LLMProvider-->>WorkflowEnricher: true / false
  WorkflowEnricher-->>StorageRoute: isMultiSite = true

  rect rgba(100, 150, 255, 0.5)
    note over StorageRoute,LLMProvider: Multi-site path
    StorageRoute->>WorkflowEnricher: generateMultiSiteWorkflowFromSearch(prompt, userId)
    WorkflowEnricher->>DuckDuckGo: multi-query search
    DuckDuckGo-->>WorkflowEnricher: multi-domain results
    WorkflowEnricher->>LLMProvider: select up to N domain-distinct URLs
    LLMProvider-->>WorkflowEnricher: selected URLs
    WorkflowEnricher->>WorkflowEnricher: generate per-site workflows + verify each
    WorkflowEnricher->>WorkflowEnricher: stitch into single multi-site workflow
    WorkflowEnricher-->>StorageRoute: {success, workflow, url}
  end

  StorageRoute-->>Client: combined workflow + finalUrl
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • getmaxun/maxun#946: Introduced the original search-based workflow generation path (generateWorkflowFromPromptWithSearch) that this PR extends with multi-site classification and a new parallel entry point.
  • getmaxun/maxun#957: Modified the same no-url control flow in POST /recordings/llm and workflowEnricher.ts, directly adjacent to where the multi-site branch is now inserted.
  • getmaxun/maxun#921: Also touches the LLM workflow-generation route and WorkflowEnricher path for prompt-driven recording setup.

Suggested reviewers

  • amhsirak

Poem

🐇 I hop through sites with curious nose,
The LLM whispers where the workflow grows.
One prompt, many domains, stitched just so,
And bunny-bright results begin to flow.
🎋

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding multi-URL/multi-site LLM extraction support.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch multi-site

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@RohitR311 RohitR311 added Type: Bug Something isn't working Type: Enhancement Improvements to existing features labels Jun 20, 2026
@RohitR311 RohitR311 requested a review from amhsirak June 20, 2026 04:46
@RohitR311 RohitR311 added Type: Feature New features and removed Type: Enhancement Improvements to existing features labels Jun 20, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/src/sdk/workflowEnricher.ts (1)

1633-1642: ⚡ Quick win

Add request timeouts to axios LLM calls to prevent indefinite hangs.

The axios POST requests to Ollama (line 1633) and OpenAI (line 1667) lack explicit timeouts. If the LLM service is slow or unresponsive, the verification step could block indefinitely, tying up browser resources and degrading the user experience.

🔧 Suggested fix: add timeout to axios calls
         const response = await axios.post(`${ollamaBaseUrl}/api/chat`, {
           model: ollamaModel,
           messages: [
             { role: 'system', content: systemPrompt },
             { role: 'user', content: userMessage }
           ],
           stream: false,
           format: jsonSchema,
           options: { temperature: 0.1 }
-        });
+        }, { timeout: 60000 });

Apply similar change to the OpenAI call at line 1667.

Also applies to: 1667-1681

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/sdk/workflowEnricher.ts` around lines 1633 - 1642, The axios POST
requests to Ollama and OpenAI LLM services lack explicit timeout configurations,
which can cause the verification step to hang indefinitely if the service is
unresponsive. Add a timeout property (in milliseconds) to the configuration
objects of both axios.post calls - specifically to the Ollama API call at the
location where the model, messages, stream, format, and options are configured,
and apply the same timeout configuration to the OpenAI API call. Choose a
reasonable timeout value that prevents indefinite blocking while allowing
sufficient time for typical LLM responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/src/sdk/workflowEnricher.ts`:
- Around line 1633-1642: The axios POST requests to Ollama and OpenAI LLM
services lack explicit timeout configurations, which can cause the verification
step to hang indefinitely if the service is unresponsive. Add a timeout property
(in milliseconds) to the configuration objects of both axios.post calls -
specifically to the Ollama API call at the location where the model, messages,
stream, format, and options are configured, and apply the same timeout
configuration to the OpenAI API call. Choose a reasonable timeout value that
prevents indefinite blocking while allowing sufficient time for typical LLM
responses.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70a42c71-cf74-4894-9ed4-0b18ff31b680

📥 Commits

Reviewing files that changed from the base of the PR and between d581502 and 5c824bc.

📒 Files selected for processing (2)
  • server/src/routes/storage.ts
  • server/src/sdk/workflowEnricher.ts

@amhsirak amhsirak merged commit 689772f into develop Jun 24, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug Something isn't working Type: Feature New features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants