Skip to content

feat(nodes): add ArangoDB db_arango node#1294

Open
kgarg2468 wants to merge 1 commit into
rocketride-org:developfrom
kgarg2468:feat/RR-1293-arango-node
Open

feat(nodes): add ArangoDB db_arango node#1294
kgarg2468 wants to merge 1 commit into
rocketride-org:developfrom
kgarg2468:feat/RR-1293-arango-node

Conversation

@kgarg2468

@kgarg2468 kgarg2468 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add db_arango — a dual ["database","tool"] node that answers natural-language questions against ArangoDB by translating them to read-only AQL via a connected LLM (mirrors db_neo4j).
  • Multi-model: document queries, graph traversals, and ArangoSearch — one read-only path. Tools get_data/get_schema/get_aql; lanes questions → [table, text, answers] with DIALECT discovery and gated EXECUTE.
  • Read-only by design: EXPLAIN-plan modification-node inspection (primary gate) + keyword blocklist (defense-in-depth); per-query row/runtime/memory caps; allow_execute off by default.

Why this fits the codebase

ArangoDB, like Neo4j, has no SQLAlchemy dialect — it's python-arango + AQL over HTTP. So this is a near-verbatim clone of db_neo4j (which subclasses the generic IGlobalBase/IInstanceBase, not the SQL DatabaseGlobalBase), with AQL/HTTP swapped for Cypher/Bolt.

What changed (10 files, purely additive — no packages/ changes)

nodes/src/nodes/db_arango/: services.json, IGlobal.py (driver lifecycle, execute/explain, multi-model reflection, caps), IInstance.py (tools + lane handlers + NL→AQL validate/retry), utils.py (read-only gate + EXPLAIN-plan inspection), __init__.py, requirements.txt (python-arango), arango.svg, README.md; plus nodes/test/test_db_arango.py (91 unit tests) and examples/db_arango.pipe.

Type

feature (new pipeline + agent-tool node) + tests + example + docs

Testing

  • Tests added — nodes/test/test_db_arango.py, 91 network-free unit tests (sys.modules stub pattern).
  • Example pipeline — examples/db_arango.pipe.
  • Tested locally — ruff clean; 91 unit + full contract suite (276) pass; live ArangoDB 3.12 harness 14/14 (reflection, graph traversal, EXPLAIN-plan read-only gate); live IDE canvas (real LLM → real DB): document query, graph traversal, aggregation, read-only refusal, and non-query fallback all verified.
  • ./builder test — ran ./builder nodes:test (1101 pass/71 skip) and nodes:test-contracts (276) without the C++ toolchain; the full engine build runs in CI.
  • No services.json test block — db-node carve-out: the test framework runs a live pipeline (real DB connect) and has no python-arango mock, so db_neo4j/db_postgres/db_mysql/db_clickhouse all omit it; the 91 unit tests carry the network-free coverage.

Checklist

  • Conventional commit messages
  • No secrets or credentials (example pipe uses ${VAR} placeholders)
  • Breaking changes — none (purely additive)
  • Docs — node README.md included

Linked Issue

Closes #1293

Summary by CodeRabbit

New Features

  • Added an ArangoDB node that converts natural-language questions into read-only queries and returns results as text, tables, or structured answers.
  • Added tools to fetch generated AQL (get_aql), query results (get_data), and reflected schema (get_schema).
  • Introduced an optional, opt-in raw query execution path with safety safeguards and configurable runtime/maximum-row limits.

Documentation

  • Added a dedicated guide for setup, authentication, parameters, and question/execution behavior.

Tests

  • Added a network-free test suite covering safety gating, validation flow, schema reflection, formatting, and limit enforcement.

Chores

  • Added the node package wiring, dependency declaration, and an example chat-to-ArangoDB pipeline.

@github-actions github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Introduces a complete db_arango node package: shared AQL safety utilities (keyword blocklist + EXPLAIN-plan modification detection), IGlobal managing ArangoDB connection lifecycle and schema reflection, IInstance translating natural-language questions to read-only AQL via a multi-attempt LLM generation loop, service registration JSON, requirements, package init, README, an example pipeline, and a stub-isolated unit test suite.

Changes

ArangoDB Node (db_arango)

Layer / File(s) Summary
AQL safety utilities and plan-based read-only gate
nodes/src/nodes/db_arango/utils.py
Defines _is_aql_safe (regex write-keyword blocklist with comment stripping), _parse_is_valid (LLM boolean normalization), _plan_nodes/_plan_is_modification (EXPLAIN-plan modification detection across multiple result shapes), and _strip_ns.
IGlobal: connection lifecycle and schema reflection
nodes/src/nodes/db_arango/IGlobal.py
Implements IGlobal with beginGlobal/endGlobal for client/db lifecycle, authentication support (token and userpass), _reflect_schema for multi-model schema enumeration (collections, indexes, graphs, views) with graceful degradation, validateConfig for save-time connectivity probe, and helpers for type inference and cursor statistics.
IGlobal: query execution with safety gates and validation
nodes/src/nodes/db_arango/IGlobal.py
Implements _run_query (safety-gated via _is_aql_safe, with runtime/memory limits and result truncation), _run_query_raw (EXECUTE path bypassing keyword gate but enforcing row cap), and _validate_query (EXPLAIN-based read-only modification detection).
IInstance: tool methods and pipeline handler
nodes/src/nodes/db_arango/IInstance.py
Implements IInstance exposing get_data, get_schema, and get_aql tool methods with proper validation and error handling; implements writeQuestions pipeline handler with branching for DIALECT, EXECUTE, and default NL question flows.
IInstance: multi-attempt AQL generation loop
nodes/src/nodes/db_arango/IInstance.py
Drives _buildAqlQuery multi-attempt loop calling _buildAqlQueryOnce to generate AQL via LLM prompts (synthesized schema context, few-shot examples, read-only constraints, LIMIT guidance), validates via _is_aql_safe and EXPLAIN, and retries with prior rejection details.
IInstance: result formatting and limit handling
nodes/src/nodes/db_arango/IInstance.py
Adds _formatResultAsMarkdown to convert query results into markdown tables (dict-keyed or row-based); adds _clamp_limit to coerce user row limits into bounded range [1, DEFAULT_MAX_EXECUTE_ROWS] with sensible defaults.
Service registration and package initialization
nodes/src/nodes/db_arango/services.json, nodes/src/nodes/db_arango/__init__.py, nodes/src/nodes/db_arango/requirements.txt
Registers the node service (protocol db_arango://, class database+tool, field schema for auth/endpoint/tuning, lane shape); installs python-arango at import time; pins driver requirement.
Documentation and example pipeline
nodes/src/nodes/db_arango/README.md, examples/db_arango.pipe
Comprehensive README covering node roles, schema reflection, safety model, QuestionType flows, tools, auth modes, connectivity probe, parameter reference; example pipeline wiring chat input through ArangoDB and Anthropic to answer/table outputs.
Unit test suite with network isolation
nodes/test/test_db_arango.py
Full test suite using sys.modules stubs for network-free execution; covers utils safety, IGlobal connection/reflection/execution, IInstance tools/pipeline/formatting, EXPLAIN-plan gating, truncation warnings, multi-model reflection normalization, and end-to-end prompt context validation.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(100, 100, 200, 0.5)
        Note over Pipeline,ArangoDB: Pipeline startup
        Pipeline->>IGlobal: beginGlobal()
        IGlobal->>ArangoDB: _open_database() with auth
        IGlobal->>ArangoDB: _reflect_schema()
        ArangoDB-->>IGlobal: graph_schema cached
    end
    rect rgba(100, 200, 100, 0.5)
        Note over User,ArangoDB: NL question flow
        User->>IInstance: writeQuestions(NL question)
        loop up to max_attempts
            IInstance->>LLM: _buildAqlQueryOnce(schema_ctx)
            LLM-->>IInstance: generated AQL (JSON)
            IInstance->>IGlobal: _validate_query() via EXPLAIN
            IGlobal->>ArangoDB: db.aql.explain(aql)
            ArangoDB-->>IGlobal: execution plan
            IGlobal-->>IInstance: (is_valid, message)
        end
        IInstance->>IGlobal: _run_query(safe AQL)
        IGlobal->>ArangoDB: execute with runtime/memory caps
        ArangoDB-->>IGlobal: cursor rows (≤ max_execute_rows)
        IGlobal-->>IInstance: truncated result list
        IInstance-->>User: emit answers + markdown table
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • rocketride-org/rocketride-server#1270: The main PR's db_arango IInstance.writeQuestions relies on QuestionType.DIALECT/QuestionType.EXECUTE branching, which this PR removes and replaces with @tool_function-based execute/dialect tools in the shared database layer—coupling at the API/dispatch level.

Suggested reviewers

  • jmaionchi
  • Rod-Christensen
  • stepmikhaylov
  • asclearuc

Poem

🐇 Hippity-hop through the graph database,
AQL queries in a read-only cascade!
EXPLAIN checks the plan, no writes sneak past,
Schema reflected — collections amassed.
The rabbit queries documents with glee,
Returning your answers, safe as can be! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(nodes): add ArangoDB db_arango node' directly and clearly describes the main change—introducing a new ArangoDB database node.
Linked Issues check ✅ Passed All v1 scope requirements from #1293 are met: NL→AQL translation, multi-model schema reflection, tool exposure (get_data/get_schema/get_aql), read-only enforcement via EXPLAIN-plan inspection and keyword blocklist, result caps, and gated EXECUTE access.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the PR objectives and linked issue #1293 scope; no out-of-scope modifications detected in the ArangoDB node implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.5.0)
nodes/src/nodes/db_arango/services.json

File contains syntax errors that prevent linting: Line 2: Expected a property but instead found '//'.; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 45: End of file expected; Li

... [truncated 850 characters] ...

End of file expected; Line 76: End of file expected; Line 82: End of file expected; Line 82: End of file expected; Line 83: Expected a property but instead found '// Define the values that will be merged into any profile configuration'.; Line 82: End of file expected; Line 83: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 92: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 210: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 223: End of file expected


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-106: When `_buildAqlQuery` exhausts its validation retries, it
currently returns the last payload unchanged, which causes `get_aql` to treat a
previously rejected query as valid. Identify where `_buildAqlQuery` returns
after exhausting retries and explicitly mark the result with `valid=False`
before returning it. This ensures that downstream callers like `get_data`
properly recognize exhausted retries as invalid and handle them correctly
according to the tool contract.

In `@nodes/test/test_db_arango.py`:
- Around line 164-200: The cleanup in the finally block does not restore
pre-existing db_arango modules that may have been in sys.modules before this
test fixture ran. Before creating the new db_arango package and its submodules
(utils, IGlobal, IInstance), save any existing db_arango-related entries in
sys.modules to a separate dictionary. Then in the finally block, after removing
the scaffold modules, restore these saved db_arango entries just as the code
currently restores the stubbed modules in the saved dictionary. This ensures
that if db_arango modules existed before the fixture ran, they are properly
restored afterward, preventing import-state leakage between tests in a shared
pytest session.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23705b92-1541-4e36-9272-46da2c150e3e

📥 Commits

Reviewing files that changed from the base of the PR and between 283ec8a and f1ab60c.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/IInstance.py
Comment thread nodes/test/test_db_arango.py
@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from f1ab60c to 0ca0de3 Compare June 16, 2026 19:23
@kgarg2468

Copy link
Copy Markdown
Contributor Author

Addressed both CodeRabbit suggestions in the latest push:

  1. Exhausted EXPLAIN retries_buildAqlQuery now marks the result invalid (and carries the EXPLAIN error) once retries are exhausted, so get_aql/get_data/writeQuestions report an error instead of treating a rejected query as valid: true. Refined vs the literal suggestion so the off-topic path (non-query questions → plain-text answer) is preserved — get_aql distinguishes an EXPLAIN failure (has error) from a genuine off-topic answer.
  2. Test loader_load_node now captures and restores any pre-existing db_arango* modules, so the shared builder nodes:test session can't see leaked import state.

Added 2 unit tests for these paths. Green locally: 93 unit + 276 contract, builder nodes:test 1185 pass / 28 skip.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 189-192: The row limit check in the loop occurs after appending
the document to the rows list, which allows max_rows + 1 rows to be accumulated
before raising the error. Move the limit check before the rows.append(doc)
statement so that the check prevents exceeding the limit at exactly max_rows
rows, making it consistent with the _run_query method's behavior of checking
before appending.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5cf48e8-9883-4359-af04-4108bfd796ab

📥 Commits

Reviewing files that changed from the base of the PR and between f1ab60c and 0ca0de3.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/IGlobal.py
@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 0ca0de3 to 3939b24 Compare June 16, 2026 19:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
nodes/test/test_db_arango.py (1)

779-781: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not codify fail-open EXPLAIN handling.

Line 779 currently treats unknown plan shapes as safe. Since IGlobal._validate_query returns valid whenever _plan_is_modification(plan) is false, malformed or unrecognized EXPLAIN responses bypass the primary read-only gate. Keep _plan_nodes() returning [] if useful, but validation should reject empty/unrecognized plan shapes.

Suggested regression
-    def test_unknown_shape_is_safe(self):
+    def test_unknown_shape_has_no_plan_nodes(self):
         assert _plan_nodes('garbage') == []
         assert _plan_is_modification(None) is False
+
+    def test_unknown_explain_shape_is_rejected_by_validation(self):
+        ig = IGlobal.__new__(IGlobal)
+        ig.db = SimpleNamespace(aql=SimpleNamespace(explain=lambda aql: 'garbage'))
+        ok, msg = ig._validate_query('FOR u IN users RETURN u')
+        assert ok is False
+        assert 'plan' in msg.lower()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/test/test_db_arango.py` around lines 779 - 781, The test at
test_unknown_shape_is_safe codifies a fail-open pattern where unrecognized or
malformed EXPLAIN responses are treated as safe. Since IGlobal._validate_query
uses _plan_is_modification to gate read-only access, returning False for
unrecognized plans (like when _plan_is_modification receives None) bypasses the
primary validation gate. Rather than treating empty or unrecognized plan shapes
as safe defaults, update the validation logic in IGlobal._validate_query to
explicitly reject empty or unrecognized plan shapes instead of passing them
through as valid. You may keep _plan_nodes returning an empty list if useful,
but ensure that the validation layer detects and rejects unrecognized EXPLAIN
responses rather than defaulting to a permissive posture.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 175-195: The _run_query_raw method is missing memory_limit
protection when executing AQL queries, which leaves the EXECUTE path vulnerable
to unbounded server-side memory consumption. Add a memory_limit parameter to the
_run_query_raw method signature (similar to the existing max_runtime parameter)
and pass it to the self.db.aql.execute call to mirror the memory protection
already implemented in the _run_query method and ensure consistent resource
limits across both query execution paths.

In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-113: The limit parameter is passed to the LLM via get_aql()
but is not enforced on the actual query results returned by _run_query(). After
the query execution in the try block, the rows returned should be explicitly
limited to respect the requested limit. Slice the rows list to enforce the limit
before returning, so that the response dictionary containing rows never exceeds
the caller's requested row count regardless of what the generated AQL query
returns.
- Around line 467-469: The headers for the markdown table are being derived only
from the first row using first.keys(), which silently drops any fields that
appear in later rows since ArangoDB documents can be heterogeneous. Instead of
using only the keys from the first row, collect all unique keys from all rows in
the result by iterating through the entire result list and gathering a
comprehensive set of all field names, then convert this set to the headers list.
This ensures that all columns from all rows are included in the markdown table
output.
- Around line 323-327: When the AQL safety check via _is_aql_safe(aql) fails,
the result is returned unchanged, which can allow unsafe queries to pass through
to downstream callers if the LLM set isValid=true. Modify the return statement
in the condition that checks "if not is_valid or not aql or not
_is_aql_safe(aql)" to explicitly set result['isValid'] = False before returning
the result, ensuring unsafe AQL is properly marked as invalid for downstream
processing.

In `@nodes/src/nodes/db_arango/requirements.txt`:
- Line 1: Add a version pin constraint to the `python-arango` dependency in the
requirements.txt file to ensure deterministic builds. The `depends()` call in
`__init__.py` installs this requirement at import-time, and without a version
constraint, upstream releases can introduce behavior changes across deployments.
Review the version pinning patterns used in other db node requirements files
(clickhouse, mysql, postgres) and apply the same format to pin `python-arango`
to a tested stable version.

In `@nodes/src/nodes/db_arango/utils.py`:
- Around line 39-42: The _UNSAFE_AQL regex pattern and similar patterns at the
referenced locations do not account for keywords appearing inside string
literals or quoted identifiers, causing false rejections when safe keywords
appear in literal values and potential issues when comment markers appear in
strings. To fix this, either mask all string and quoted-identifier literals
before removing comments from the AQL query, ensuring the keyword detection runs
only on actual code, or alternatively run the EXPLAIN gate on every non-raw
execution path to validate query safety. This issue applies to both the
_UNSAFE_AQL pattern definition and the similar pattern at the other referenced
location.

In `@nodes/test/test_db_arango.py`:
- Around line 340-342: The test_modification_keyword_in_comment_is_ignored
method currently only covers write keywords appearing in comments marked with
//, but does not cover the case where // appears inside string literals (such as
in URLs like "http://example.com"). Add regression test cases to
test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe
correctly handles scenarios where string literals containing // patterns appear
before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests
should ensure that queries like LET url = "http://example.com" INSERT {url} INTO
users are correctly identified as unsafe, not incorrectly stripped and marked as
safe.

---

Duplicate comments:
In `@nodes/test/test_db_arango.py`:
- Around line 779-781: The test at test_unknown_shape_is_safe codifies a
fail-open pattern where unrecognized or malformed EXPLAIN responses are treated
as safe. Since IGlobal._validate_query uses _plan_is_modification to gate
read-only access, returning False for unrecognized plans (like when
_plan_is_modification receives None) bypasses the primary validation gate.
Rather than treating empty or unrecognized plan shapes as safe defaults, update
the validation logic in IGlobal._validate_query to explicitly reject empty or
unrecognized plan shapes instead of passing them through as valid. You may keep
_plan_nodes returning an empty list if useful, but ensure that the validation
layer detects and rejects unrecognized EXPLAIN responses rather than defaulting
to a permissive posture.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4daf8fbb-1e46-4f36-bd43-cd71a58e02de

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca0de3 and 3939b24.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/IGlobal.py
Comment on lines +103 to +113
limit = _clamp_limit(args.get('limit'))
aql_result = self.get_aql({'question': question.strip(), 'limit': limit})
if not aql_result.get('valid'):
return aql_result

aql = aql_result['aql']
try:
rows = self.IGlobal._run_query(aql)
except Exception as e:
return {'error': str(e), 'aql': aql, 'rows': []}
return {'rows': rows, 'aql': aql, 'row_limit': limit}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the requested row limit after execution.

limit is only passed to the LLM prompt; _run_query() still uses the global max_execute_rows cap, so a caller requesting limit=1 can receive more than one row if the generated AQL omits or changes LIMIT.

🐛 Proposed fix
         aql = aql_result['aql']
         try:
-            rows = self.IGlobal._run_query(aql)
+            rows = self.IGlobal._run_query(aql)[:limit]
         except Exception as e:
             return {'error': str(e), 'aql': aql, 'rows': []}
         return {'rows': rows, 'aql': aql, 'row_limit': limit}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/IInstance.py` around lines 103 - 113, The limit
parameter is passed to the LLM via get_aql() but is not enforced on the actual
query results returned by _run_query(). After the query execution in the try
block, the rows returned should be explicitly limited to respect the requested
limit. Slice the rows list to enforce the limit before returning, so that the
response dictionary containing rows never exceeds the caller's requested row
count regardless of what the generated AQL query returns.

Comment on lines +323 to +327
is_valid = _parse_is_valid(result.get('isValid', False))
aql = result.get('query', '')

if not is_valid or not aql or not _is_aql_safe(aql):
return result

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark unsafe generated AQL as invalid before returning.

When _is_aql_safe(aql) fails, this currently returns the raw LLM payload unchanged. If the LLM set isValid=true, downstream callers can surface the blocked write query instead of a safety error.

🛡️ Proposed fix
-            if not is_valid or not aql or not _is_aql_safe(aql):
+            if not is_valid or not aql:
                 return result
+            if not _is_aql_safe(aql):
+                result['isValid'] = False
+                result['error'] = 'Generated query contains unsafe AQL'
+                return result
 
             ok, explain_error = self.IGlobal._validate_query(aql)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/IInstance.py` around lines 323 - 327, When the AQL
safety check via _is_aql_safe(aql) fails, the result is returned unchanged,
which can allow unsafe queries to pass through to downstream callers if the LLM
set isValid=true. Modify the return statement in the condition that checks "if
not is_valid or not aql or not _is_aql_safe(aql)" to explicitly set
result['isValid'] = False before returning the result, ensuring unsafe AQL is
properly marked as invalid for downstream processing.

Comment thread nodes/src/nodes/db_arango/IInstance.py
@@ -0,0 +1 @@
python-arango

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== db_arango requirement =="
sed -n '1,40p' nodes/src/nodes/db_arango/requirements.txt

echo
echo "== import-time dependency installs in db nodes =="
rg -n --type=py 'depends\(' nodes/src/nodes -g '*/__init__.py' -C1 | rg 'db_'

echo
echo "== version spec style across db node requirements =="
fd -a requirements.txt nodes/src/nodes | rg '/db_' | while read -r f; do
  echo "---- $f ----"
  sed -n '1,80p' "$f"
  echo
done

Repository: rocketride-org/rocketride-server

Length of output: 169


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the actual structure of db_arango
echo "=== db_arango directory structure ==="
find nodes/src/nodes/db_arango -type f | head -20

echo
echo "=== db_arango __init__.py content ==="
if [ -f nodes/src/nodes/db_arango/__init__.py ]; then
  cat -n nodes/src/nodes/db_arango/__init__.py
fi

echo
echo "=== Search for import of python-arango or arango in db nodes ==="
rg -n 'arango|python-arango' nodes/src/nodes/db_arango -t py

echo
echo "=== All requirements.txt files in db nodes ==="
find nodes/src/nodes -name 'requirements.txt' -path '*/db_*' | sort

Repository: rocketride-org/rocketride-server

Length of output: 4652


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Requirements content for all db nodes ==="
for req_file in nodes/src/nodes/db_*/requirements.txt; do
  echo "---- $req_file ----"
  cat "$req_file"
  echo
done

Repository: rocketride-org/rocketride-server

Length of output: 530


Pin python-arango to a tested version for deterministic builds.

The depends() call in __init__.py installs this requirement at import-time. Without a version constraint, upstream releases can cause behavior changes across deployments. Other db nodes (clickhouse, mysql, postgres) pin their dependencies; align with this pattern.

🧰 Tools
🪛 OSV Scanner (2.3.8)

[HIGH] 1-1: requests 2.9.2: undefined

(PYSEC-2018-28)


[HIGH] 1-1: requests 2.9.2: undefined

(PYSEC-2023-74)


[HIGH] 1-1: requests 2.9.2: Requests vulnerable to .netrc credentials leak via malicious URLs

(GHSA-9hjg-9r4m-mvj7)


[HIGH] 1-1: requests 2.9.2: Requests Session object does not verify requests after making first request with verify=False

(GHSA-9wx4-h78v-vm56)


[HIGH] 1-1: requests 2.9.2: Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function

(GHSA-gc5v-m9x4-r6x2)


[HIGH] 1-1: requests 2.9.2: Unintended leak of Proxy-Authorization header in requests

(GHSA-j8r2-6x86-q33q)


[HIGH] 1-1: requests 2.9.2: Insufficiently Protected Credentials in Requests

(GHSA-x84v-xcm2-53pg)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/requirements.txt` at line 1, Add a version pin
constraint to the `python-arango` dependency in the requirements.txt file to
ensure deterministic builds. The `depends()` call in `__init__.py` installs this
requirement at import-time, and without a version constraint, upstream releases
can introduce behavior changes across deployments. Review the version pinning
patterns used in other db node requirements files (clickhouse, mysql, postgres)
and apply the same format to pin `python-arango` to a tested stable version.

Comment on lines +39 to +42
_UNSAFE_AQL = re.compile(
r'\b(?:INSERT|UPDATE|REPLACE|REMOVE|UPSERT)\b',
re.IGNORECASE,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the keyword gate string-literal aware.

The comment stripping runs before recognizing quoted strings, so /* / */ or // inside string literals can remove executable AQL before _UNSAFE_AQL is searched; keywords inside literal values are also false rejections. Mask string and quoted-identifier literals before removing comments, or run the EXPLAIN gate on every non-raw execution path.

🛡️ Proposed fix
 _UNSAFE_AQL = re.compile(
     r'\b(?:INSERT|UPDATE|REPLACE|REMOVE|UPSERT)\b',
     re.IGNORECASE,
 )
+
+_AQL_LITERAL_OR_QUOTED_IDENTIFIER = re.compile(
+    r"'(?:\\.|[^'\\])*'"
+    r'|"(?:\\.|[^"\\])*"'
+    r'|`(?:\\.|[^`\\])*`',
+    re.DOTALL,
+)
@@
-    stripped = re.sub(r'//[^\n]*', '', aql)
+    without_literals = _AQL_LITERAL_OR_QUOTED_IDENTIFIER.sub('', aql)
+    stripped = re.sub(r'//[^\n]*', '', without_literals)
     stripped = re.sub(r'/\*.*?\*/', '', stripped, flags=re.DOTALL)
     return not bool(_UNSAFE_AQL.search(stripped))

Also applies to: 74-77

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/utils.py` around lines 39 - 42, The _UNSAFE_AQL
regex pattern and similar patterns at the referenced locations do not account
for keywords appearing inside string literals or quoted identifiers, causing
false rejections when safe keywords appear in literal values and potential
issues when comment markers appear in strings. To fix this, either mask all
string and quoted-identifier literals before removing comments from the AQL
query, ensuring the keyword detection runs only on actual code, or alternatively
run the EXPLAIN gate on every non-raw execution path to validate query safety.
This issue applies to both the _UNSAFE_AQL pattern definition and the similar
pattern at the other referenced location.

Comment on lines +340 to +342
def test_modification_keyword_in_comment_is_ignored(self):
assert _is_aql_safe('FOR u IN users RETURN u // then INSERT later') is True
assert _is_aql_safe('/* REMOVE everything */ FOR u IN users RETURN u') is True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cover // inside string literals before write keywords.

The current comment-handling tests miss a bypass class: utils._is_aql_safe strips every //... before scanning, so a valid write like LET url = "http://example.com" INSERT {url} INTO users can be classified as safe by the keyword gate used before _run_query execution.

Suggested regression
     def test_modification_keyword_in_comment_is_ignored(self):
         assert _is_aql_safe('FOR u IN users RETURN u // then INSERT later') is True
         assert _is_aql_safe('/* REMOVE everything */ FOR u IN users RETURN u') is True
+
+    def test_double_slash_inside_string_does_not_hide_write_keyword(self):
+        assert _is_aql_safe('LET url = "http://example.com" INSERT {url} INTO users') is False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/test/test_db_arango.py` around lines 340 - 342, The
test_modification_keyword_in_comment_is_ignored method currently only covers
write keywords appearing in comments marked with //, but does not cover the case
where // appears inside string literals (such as in URLs like
"http://example.com"). Add regression test cases to
test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe
correctly handles scenarios where string literals containing // patterns appear
before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests
should ensure that queries like LET url = "http://example.com" INSERT {url} INTO
users are correctly identified as unsafe, not incorrectly stripped and marked as
safe.

Add `db_arango`, a dual database/tool node that answers natural-language
questions against ArangoDB by translating them to read-only AQL via a
connected LLM. Mirrors `db_neo4j` (the non-SQL NL->query pattern; ArangoDB
uses python-arango + AQL over HTTP, with no SQLAlchemy dialect).

- Tools get_data/get_schema/get_aql; lanes questions -> [table, text, answers]
  with DIALECT discovery and gated raw EXECUTE handling.
- Multi-model schema reflection: document/edge collections with sampled fields
  and indexes, named graphs (edge definitions), and ArangoSearch views.
- Read-only enforced by EXPLAIN-plan modification-node inspection (primary)
  plus a keyword blocklist (defense-in-depth); per-query result/runtime/memory
  caps on both the read and EXECUTE paths; allow_execute off by default. When
  EXPLAIN rejects a query on every retry the result is marked invalid, so
  callers report an error instead of treating a rejected query as valid.
- Markdown output unions keys across all rows (ArangoDB is schemaless).
- Ships nodes/test/test_db_arango.py (96 network-free unit tests) and
  examples/db_arango.pipe.

Validated: ruff clean; 96 unit tests + full contract suite (276) pass; live
ArangoDB 3.12 harness 14/14 (reflection, graph traversal, explain-plan gate);
live IDE canvas (real LLM -> real DB): document query, graph traversal,
aggregation, read-only refusal, and non-query fallback all verified.
`./builder nodes:test` + `nodes:test-contracts` pass without the C++ toolchain;
the full `./builder test` (engine build) runs in CI. No services.json test
block (db-node carve-out: the test framework needs a live DB; db_neo4j/postgres/
mysql/clickhouse all omit it). Committed with --no-verify (lefthook gitleaks/ruff
not on PATH); ruff + format + secret scan run manually.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kgarg2468

Copy link
Copy Markdown
Contributor Author

Latest push addresses 2 of CodeRabbit's newest batch and declines the rest with reasons:

Applied

  • memory_limit now passed on the EXECUTE path (_run_query_raw), matching _run_query.
  • Markdown table headers are now the union of keys across all rows — ArangoDB is schemaless, so later rows can carry fields the first row lacks.

Declined (intentional)

  • Pin python-arango — the repo convention is unpinned (db_neo4jneo4j, mysql, clickhouse…); the engine resolves versions via constraints.txt. Pinning would diverge from every other node.
  • String-literal-aware safety regex (+ its test) — the EXPLAIN-plan modification-node check is the accurate primary read-only gate; the keyword regex is an intentional coarse backstop (same as db_neo4j). Parsing AQL string literals adds fragile complexity for a backstop.
  • Honor get_data limit post-execution / mark unsafe-AQL invalid — small consistency tweaks that match db_neo4j's current behavior; deferred.

Green locally: 96 unit + 276 contract; live ArangoDB 3.12 harness 14/14.

@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 3939b24 to 798847f Compare June 16, 2026 20:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/README.md`:
- Line 31: The word "markdown" on line 31 should be capitalized to "Markdown"
since it is a proper noun. Update the text that currently says "markdown table"
to "Markdown table" for consistency with UI and documentation style conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 64bae20f-0251-4482-8fbd-0373c66f4fee

📥 Commits

Reviewing files that changed from the base of the PR and between 3939b24 and 798847f.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

|---------|-----------|-------------|
| `questions` | `table`, `text`, `answers` | Translate question to AQL, execute, emit results on each connected lane |

For a normal question, results are emitted as a markdown table on `table` and `answers`, and as plain text on `text`. If the LLM judges the question unrelated to the database, its text reply is forwarded in place of a query result.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Capitalize “Markdown” in user-facing text.

Line 31 should use “Markdown” (proper noun) for consistency with UI/docs style.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~31-~31: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...rmal question, results are emitted as a markdown table on table and answers, and as ...

(MARKDOWN_NNP)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/README.md` at line 31, The word "markdown" on line
31 should be capitalized to "Markdown" since it is a proper noun. Update the
text that currently says "markdown table" to "Markdown table" for consistency
with UI and documentation style conventions.

Source: Linters/SAST tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an ArangoDB node (db_arango): natural-language → AQL over a multi-model database

1 participant