feat: add support for multi url llm extraction#1117
Conversation
|
Caution Review failedPull request was closed or merged during review WalkthroughAdds multi-site research-mode workflow generation to ChangesMulti-site workflow generation and LLM verification
Sequence Diagram(s)sequenceDiagram
participant Client
participant StorageRoute as POST /recordings/llm
participant WorkflowEnricher
participant DuckDuckGo
participant LLMProvider
Client->>StorageRoute: prompt (no url)
StorageRoute->>WorkflowEnricher: isMultiSitePrompt(prompt)
WorkflowEnricher->>LLMProvider: classify prompt
LLMProvider-->>WorkflowEnricher: true / false
WorkflowEnricher-->>StorageRoute: isMultiSite = true
rect rgba(100, 150, 255, 0.5)
note over StorageRoute,LLMProvider: Multi-site path
StorageRoute->>WorkflowEnricher: generateMultiSiteWorkflowFromSearch(prompt, userId)
WorkflowEnricher->>DuckDuckGo: multi-query search
DuckDuckGo-->>WorkflowEnricher: multi-domain results
WorkflowEnricher->>LLMProvider: select up to N domain-distinct URLs
LLMProvider-->>WorkflowEnricher: selected URLs
WorkflowEnricher->>WorkflowEnricher: generate per-site workflows + verify each
WorkflowEnricher->>WorkflowEnricher: stitch into single multi-site workflow
WorkflowEnricher-->>StorageRoute: {success, workflow, url}
end
StorageRoute-->>Client: combined workflow + finalUrl
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/src/sdk/workflowEnricher.ts (1)
1633-1642: ⚡ Quick winAdd request timeouts to axios LLM calls to prevent indefinite hangs.
The axios POST requests to Ollama (line 1633) and OpenAI (line 1667) lack explicit timeouts. If the LLM service is slow or unresponsive, the verification step could block indefinitely, tying up browser resources and degrading the user experience.
🔧 Suggested fix: add timeout to axios calls
const response = await axios.post(`${ollamaBaseUrl}/api/chat`, { model: ollamaModel, messages: [ { role: 'system', content: systemPrompt }, { role: 'user', content: userMessage } ], stream: false, format: jsonSchema, options: { temperature: 0.1 } - }); + }, { timeout: 60000 });Apply similar change to the OpenAI call at line 1667.
Also applies to: 1667-1681
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/sdk/workflowEnricher.ts` around lines 1633 - 1642, The axios POST requests to Ollama and OpenAI LLM services lack explicit timeout configurations, which can cause the verification step to hang indefinitely if the service is unresponsive. Add a timeout property (in milliseconds) to the configuration objects of both axios.post calls - specifically to the Ollama API call at the location where the model, messages, stream, format, and options are configured, and apply the same timeout configuration to the OpenAI API call. Choose a reasonable timeout value that prevents indefinite blocking while allowing sufficient time for typical LLM responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/src/sdk/workflowEnricher.ts`:
- Around line 1633-1642: The axios POST requests to Ollama and OpenAI LLM
services lack explicit timeout configurations, which can cause the verification
step to hang indefinitely if the service is unresponsive. Add a timeout property
(in milliseconds) to the configuration objects of both axios.post calls -
specifically to the Ollama API call at the location where the model, messages,
stream, format, and options are configured, and apply the same timeout
configuration to the OpenAI API call. Choose a reasonable timeout value that
prevents indefinite blocking while allowing sufficient time for typical LLM
responses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 70a42c71-cf74-4894-9ed4-0b18ff31b680
📒 Files selected for processing (2)
server/src/routes/storage.tsserver/src/sdk/workflowEnricher.ts
What this PR does?
Feat:
Add support for multi-URL AI robot extraction. Based on prompt decides whether to navigate to multiple sites in order to extract data.
Fix:
For the AI powered robot creation the LLM selects a group which results in actual data being sampled. However this sample was never checked against the prompt post creation on the off chance that the data does not align with the prompt. This PR verifies the sample data and ensures a second time around the data aligns the prompt and then goes ahead and tries again if it does not.
Summary by CodeRabbit