feat(nodes): add ArangoDB db_arango node by kgarg2468 · Pull Request #1294 · rocketride-org/rocketride-server

kgarg2468 · 2026-06-16T17:55:35Z

Summary

Add db_arango — a dual ["database","tool"] node that answers natural-language questions against ArangoDB by translating them to read-only AQL via a connected LLM (mirrors db_neo4j).
Multi-model: document queries, graph traversals, and ArangoSearch — one read-only path. Tools get_data/get_schema/get_aql; lanes questions → [table, text, answers] with DIALECT discovery and gated EXECUTE.
Read-only by design: EXPLAIN-plan modification-node inspection (primary gate) + keyword blocklist (defense-in-depth); per-query row/runtime/memory caps; allow_execute off by default.

Why this fits the codebase

ArangoDB, like Neo4j, has no SQLAlchemy dialect — it's python-arango + AQL over HTTP. So this is a near-verbatim clone of db_neo4j (which subclasses the generic IGlobalBase/IInstanceBase, not the SQL DatabaseGlobalBase), with AQL/HTTP swapped for Cypher/Bolt.

What changed (10 files, purely additive — no `packages/` changes)

nodes/src/nodes/db_arango/: services.json, IGlobal.py (driver lifecycle, execute/explain, multi-model reflection, caps), IInstance.py (tools + lane handlers + NL→AQL validate/retry), utils.py (read-only gate + EXPLAIN-plan inspection), __init__.py, requirements.txt (python-arango), arango.svg, README.md; plus nodes/test/test_db_arango.py (91 unit tests) and examples/db_arango.pipe.

Type

feature (new pipeline + agent-tool node) + tests + example + docs

Testing

Tests added — nodes/test/test_db_arango.py, 91 network-free unit tests (sys.modules stub pattern).
Example pipeline — examples/db_arango.pipe.
Tested locally — ruff clean; 91 unit + full contract suite (276) pass; live ArangoDB 3.12 harness 14/14 (reflection, graph traversal, EXPLAIN-plan read-only gate); live IDE canvas (real LLM → real DB): document query, graph traversal, aggregation, read-only refusal, and non-query fallback all verified.
./builder test — ran ./builder nodes:test (1101 pass/71 skip) and nodes:test-contracts (276) without the C++ toolchain; the full engine build runs in CI.
No services.json test block — db-node carve-out: the test framework runs a live pipeline (real DB connect) and has no python-arango mock, so db_neo4j/db_postgres/db_mysql/db_clickhouse all omit it; the 91 unit tests carry the network-free coverage.

Checklist

Conventional commit messages
No secrets or credentials (example pipe uses ${VAR} placeholders)
Breaking changes — none (purely additive)
Docs — node README.md included

Linked Issue

Closes #1293

Summary by CodeRabbit

New Features

Added an ArangoDB node that converts natural-language questions into read-only queries and returns results as text, tables, or structured answers.
Added tools to fetch generated AQL (get_aql), query results (get_data), and reflected schema (get_schema).
Introduced an optional, opt-in raw query execution path with safety safeguards and configurable runtime/maximum-row limits.

Documentation

Added a dedicated guide for setup, authentication, parameters, and question/execution behavior.

Tests

Added a network-free test suite covering safety gating, validation flow, schema reflection, formatting, and limit enforcement.

Chores

Added the node package wiring, dependency declaration, and an example chat-to-ArangoDB pipeline.

coderabbitai · 2026-06-16T17:56:02Z

📝 Walkthrough

Walkthrough

Introduces a complete db_arango node package: shared AQL safety utilities (keyword blocklist + EXPLAIN-plan modification detection), IGlobal managing ArangoDB connection lifecycle and schema reflection, IInstance translating natural-language questions to read-only AQL via a multi-attempt LLM generation loop, service registration JSON, requirements, package init, README, an example pipeline, and a stub-isolated unit test suite.

Changes

ArangoDB Node (db_arango)

Layer / File(s)	Summary
AQL safety utilities and plan-based read-only gate `nodes/src/nodes/db_arango/utils.py`	Defines `_is_aql_safe` (regex write-keyword blocklist with comment stripping), `_parse_is_valid` (LLM boolean normalization), `_plan_nodes`/`_plan_is_modification` (EXPLAIN-plan modification detection across multiple result shapes), and `_strip_ns`.
IGlobal: connection lifecycle and schema reflection `nodes/src/nodes/db_arango/IGlobal.py`	Implements `IGlobal` with `beginGlobal`/`endGlobal` for client/db lifecycle, authentication support (token and userpass), `_reflect_schema` for multi-model schema enumeration (collections, indexes, graphs, views) with graceful degradation, `validateConfig` for save-time connectivity probe, and helpers for type inference and cursor statistics.
IGlobal: query execution with safety gates and validation `nodes/src/nodes/db_arango/IGlobal.py`	Implements `_run_query` (safety-gated via `_is_aql_safe`, with runtime/memory limits and result truncation), `_run_query_raw` (EXECUTE path bypassing keyword gate but enforcing row cap), and `_validate_query` (EXPLAIN-based read-only modification detection).
IInstance: tool methods and pipeline handler `nodes/src/nodes/db_arango/IInstance.py`	Implements `IInstance` exposing `get_data`, `get_schema`, and `get_aql` tool methods with proper validation and error handling; implements `writeQuestions` pipeline handler with branching for `DIALECT`, `EXECUTE`, and default NL question flows.
IInstance: multi-attempt AQL generation loop `nodes/src/nodes/db_arango/IInstance.py`	Drives `_buildAqlQuery` multi-attempt loop calling `_buildAqlQueryOnce` to generate AQL via LLM prompts (synthesized schema context, few-shot examples, read-only constraints, LIMIT guidance), validates via `_is_aql_safe` and EXPLAIN, and retries with prior rejection details.
IInstance: result formatting and limit handling `nodes/src/nodes/db_arango/IInstance.py`	Adds `_formatResultAsMarkdown` to convert query results into markdown tables (dict-keyed or row-based); adds `_clamp_limit` to coerce user row limits into bounded range `[1, DEFAULT_MAX_EXECUTE_ROWS]` with sensible defaults.
Service registration and package initialization `nodes/src/nodes/db_arango/services.json`, `nodes/src/nodes/db_arango/__init__.py`, `nodes/src/nodes/db_arango/requirements.txt`	Registers the node service (protocol `db_arango://`, class `database`+`tool`, field schema for auth/endpoint/tuning, lane shape); installs `python-arango` at import time; pins driver requirement.
Documentation and example pipeline `nodes/src/nodes/db_arango/README.md`, `examples/db_arango.pipe`	Comprehensive README covering node roles, schema reflection, safety model, `QuestionType` flows, tools, auth modes, connectivity probe, parameter reference; example pipeline wiring chat input through ArangoDB and Anthropic to answer/table outputs.
Unit test suite with network isolation `nodes/test/test_db_arango.py`	Full test suite using `sys.modules` stubs for network-free execution; covers utils safety, IGlobal connection/reflection/execution, IInstance tools/pipeline/formatting, EXPLAIN-plan gating, truncation warnings, multi-model reflection normalization, and end-to-end prompt context validation.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(100, 100, 200, 0.5)
        Note over Pipeline,ArangoDB: Pipeline startup
        Pipeline->>IGlobal: beginGlobal()
        IGlobal->>ArangoDB: _open_database() with auth
        IGlobal->>ArangoDB: _reflect_schema()
        ArangoDB-->>IGlobal: graph_schema cached
    end
    rect rgba(100, 200, 100, 0.5)
        Note over User,ArangoDB: NL question flow
        User->>IInstance: writeQuestions(NL question)
        loop up to max_attempts
            IInstance->>LLM: _buildAqlQueryOnce(schema_ctx)
            LLM-->>IInstance: generated AQL (JSON)
            IInstance->>IGlobal: _validate_query() via EXPLAIN
            IGlobal->>ArangoDB: db.aql.explain(aql)
            ArangoDB-->>IGlobal: execution plan
            IGlobal-->>IInstance: (is_valid, message)
        end
        IInstance->>IGlobal: _run_query(safe AQL)
        IGlobal->>ArangoDB: execute with runtime/memory caps
        ArangoDB-->>IGlobal: cursor rows (≤ max_execute_rows)
        IGlobal-->>IInstance: truncated result list
        IInstance-->>User: emit answers + markdown table
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

rocketride-org/rocketride-server#1270: The main PR's db_arango IInstance.writeQuestions relies on QuestionType.DIALECT/QuestionType.EXECUTE branching, which this PR removes and replaces with @tool_function-based execute/dialect tools in the shared database layer—coupling at the API/dispatch level.

Suggested reviewers

jmaionchi
Rod-Christensen
stepmikhaylov
asclearuc

Poem

🐇 Hippity-hop through the graph database,
AQL queries in a read-only cascade!
EXPLAIN checks the plan, no writes sneak past,
Schema reflected — collections amassed.
The rabbit queries documents with glee,
Returning your answers, safe as can be! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat(nodes): add ArangoDB db_arango node' directly and clearly describes the main change—introducing a new ArangoDB database node.
Linked Issues check	✅ Passed	All v1 scope requirements from `#1293` are met: NL→AQL translation, multi-model schema reflection, tool exposure (get_data/get_schema/get_aql), read-only enforcement via EXPLAIN-plan inspection and keyword blocklist, result caps, and gated EXECUTE access.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the PR objectives and linked issue `#1293` scope; no out-of-scope modifications detected in the ArangoDB node implementation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.5.0)

nodes/src/nodes/db_arango/services.json

File contains syntax errors that prevent linting: Line 2: Expected a property but instead found '//'.; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 45: End of file expected; Li

... [truncated 850 characters] ...

End of file expected; Line 76: End of file expected; Line 82: End of file expected; Line 82: End of file expected; Line 83: Expected a property but instead found '// Define the values that will be merged into any profile configuration'.; Line 82: End of file expected; Line 83: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 92: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 210: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 223: End of file expected

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-16T17:56:22Z

🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-106: When `_buildAqlQuery` exhausts its validation retries, it
currently returns the last payload unchanged, which causes `get_aql` to treat a
previously rejected query as valid. Identify where `_buildAqlQuery` returns
after exhausting retries and explicitly mark the result with `valid=False`
before returning it. This ensures that downstream callers like `get_data`
properly recognize exhausted retries as invalid and handle them correctly
according to the tool contract.

In `@nodes/test/test_db_arango.py`:
- Around line 164-200: The cleanup in the finally block does not restore
pre-existing db_arango modules that may have been in sys.modules before this
test fixture ran. Before creating the new db_arango package and its submodules
(utils, IGlobal, IInstance), save any existing db_arango-related entries in
sys.modules to a separate dictionary. Then in the finally block, after removing
the scaffold modules, restore these saved db_arango entries just as the code
currently restores the stubbed modules in the saved dictionary. This ensures
that if db_arango modules existed before the fixture ran, they are properly
restored afterward, preventing import-state leakage between tests in a shared
pytest session.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23705b92-1541-4e36-9272-46da2c150e3e

📥 Commits

Reviewing files that changed from the base of the PR and between 283ec8a and f1ab60c.

⛔ Files ignored due to path filters (1)

nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg

📒 Files selected for processing (9)

examples/db_arango.pipe
nodes/src/nodes/db_arango/IGlobal.py
nodes/src/nodes/db_arango/IInstance.py
nodes/src/nodes/db_arango/README.md
nodes/src/nodes/db_arango/__init__.py
nodes/src/nodes/db_arango/requirements.txt
nodes/src/nodes/db_arango/services.json
nodes/src/nodes/db_arango/utils.py
nodes/test/test_db_arango.py

kgarg2468 · 2026-06-16T19:23:32Z

Addressed both CodeRabbit suggestions in the latest push:

Exhausted EXPLAIN retries — _buildAqlQuery now marks the result invalid (and carries the EXPLAIN error) once retries are exhausted, so get_aql/get_data/writeQuestions report an error instead of treating a rejected query as valid: true. Refined vs the literal suggestion so the off-topic path (non-query questions → plain-text answer) is preserved — get_aql distinguishes an EXPLAIN failure (has error) from a genuine off-topic answer.
Test loader — _load_node now captures and restores any pre-existing db_arango* modules, so the shared builder nodes:test session can't see leaked import state.

Added 2 unit tests for these paths. Green locally: 93 unit + 276 contract, builder nodes:test 1185 pass / 28 skip.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 189-192: The row limit check in the loop occurs after appending
the document to the rows list, which allows max_rows + 1 rows to be accumulated
before raising the error. Move the limit check before the rows.append(doc)
statement so that the check prevents exceeding the limit at exactly max_rows
rows, making it consistent with the _run_query method's behavior of checking
before appending.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5cf48e8-9883-4359-af04-4108bfd796ab

📥 Commits

Reviewing files that changed from the base of the PR and between f1ab60c and 0ca0de3.

⛔ Files ignored due to path filters (1)

nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg

📒 Files selected for processing (9)

examples/db_arango.pipe
nodes/src/nodes/db_arango/IGlobal.py
nodes/src/nodes/db_arango/IInstance.py
nodes/src/nodes/db_arango/README.md
nodes/src/nodes/db_arango/__init__.py
nodes/src/nodes/db_arango/requirements.txt
nodes/src/nodes/db_arango/services.json
nodes/src/nodes/db_arango/utils.py
nodes/test/test_db_arango.py

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (1)

nodes/test/test_db_arango.py (1)

779-781: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not codify fail-open EXPLAIN handling.

Line 779 currently treats unknown plan shapes as safe. Since IGlobal._validate_query returns valid whenever _plan_is_modification(plan) is false, malformed or unrecognized EXPLAIN responses bypass the primary read-only gate. Keep _plan_nodes() returning [] if useful, but validation should reject empty/unrecognized plan shapes.

Suggested regression

-    def test_unknown_shape_is_safe(self):
+    def test_unknown_shape_has_no_plan_nodes(self):
         assert _plan_nodes('garbage') == []
         assert _plan_is_modification(None) is False
+
+    def test_unknown_explain_shape_is_rejected_by_validation(self):
+        ig = IGlobal.__new__(IGlobal)
+        ig.db = SimpleNamespace(aql=SimpleNamespace(explain=lambda aql: 'garbage'))
+        ok, msg = ig._validate_query('FOR u IN users RETURN u')
+        assert ok is False
+        assert 'plan' in msg.lower()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/test/test_db_arango.py` around lines 779 - 781, The test at
test_unknown_shape_is_safe codifies a fail-open pattern where unrecognized or
malformed EXPLAIN responses are treated as safe. Since IGlobal._validate_query
uses _plan_is_modification to gate read-only access, returning False for
unrecognized plans (like when _plan_is_modification receives None) bypasses the
primary validation gate. Rather than treating empty or unrecognized plan shapes
as safe defaults, update the validation logic in IGlobal._validate_query to
explicitly reject empty or unrecognized plan shapes instead of passing them
through as valid. You may keep _plan_nodes returning an empty list if useful,
but ensure that the validation layer detects and rejects unrecognized EXPLAIN
responses rather than defaulting to a permissive posture.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 175-195: The _run_query_raw method is missing memory_limit
protection when executing AQL queries, which leaves the EXECUTE path vulnerable
to unbounded server-side memory consumption. Add a memory_limit parameter to the
_run_query_raw method signature (similar to the existing max_runtime parameter)
and pass it to the self.db.aql.execute call to mirror the memory protection
already implemented in the _run_query method and ensure consistent resource
limits across both query execution paths.

In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-113: The limit parameter is passed to the LLM via get_aql()
but is not enforced on the actual query results returned by _run_query(). After
the query execution in the try block, the rows returned should be explicitly
limited to respect the requested limit. Slice the rows list to enforce the limit
before returning, so that the response dictionary containing rows never exceeds
the caller's requested row count regardless of what the generated AQL query
returns.
- Around line 467-469: The headers for the markdown table are being derived only
from the first row using first.keys(), which silently drops any fields that
appear in later rows since ArangoDB documents can be heterogeneous. Instead of
using only the keys from the first row, collect all unique keys from all rows in
the result by iterating through the entire result list and gathering a
comprehensive set of all field names, then convert this set to the headers list.
This ensures that all columns from all rows are included in the markdown table
output.
- Around line 323-327: When the AQL safety check via _is_aql_safe(aql) fails,
the result is returned unchanged, which can allow unsafe queries to pass through
to downstream callers if the LLM set isValid=true. Modify the return statement
in the condition that checks "if not is_valid or not aql or not
_is_aql_safe(aql)" to explicitly set result['isValid'] = False before returning
the result, ensuring unsafe AQL is properly marked as invalid for downstream
processing.

In `@nodes/src/nodes/db_arango/requirements.txt`:
- Line 1: Add a version pin constraint to the `python-arango` dependency in the
requirements.txt file to ensure deterministic builds. The `depends()` call in
`__init__.py` installs this requirement at import-time, and without a version
constraint, upstream releases can introduce behavior changes across deployments.
Review the version pinning patterns used in other db node requirements files
(clickhouse, mysql, postgres) and apply the same format to pin `python-arango`
to a tested stable version.

In `@nodes/src/nodes/db_arango/utils.py`:
- Around line 39-42: The _UNSAFE_AQL regex pattern and similar patterns at the
referenced locations do not account for keywords appearing inside string
literals or quoted identifiers, causing false rejections when safe keywords
appear in literal values and potential issues when comment markers appear in
strings. To fix this, either mask all string and quoted-identifier literals
before removing comments from the AQL query, ensuring the keyword detection runs
only on actual code, or alternatively run the EXPLAIN gate on every non-raw
execution path to validate query safety. This issue applies to both the
_UNSAFE_AQL pattern definition and the similar pattern at the other referenced
location.

In `@nodes/test/test_db_arango.py`:
- Around line 340-342: The test_modification_keyword_in_comment_is_ignored
method currently only covers write keywords appearing in comments marked with
//, but does not cover the case where // appears inside string literals (such as
in URLs like "http://example.com"). Add regression test cases to
test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe
correctly handles scenarios where string literals containing // patterns appear
before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests
should ensure that queries like LET url = "http://example.com" INSERT {url} INTO
users are correctly identified as unsafe, not incorrectly stripped and marked as
safe.

---

Duplicate comments:
In `@nodes/test/test_db_arango.py`:
- Around line 779-781: The test at test_unknown_shape_is_safe codifies a
fail-open pattern where unrecognized or malformed EXPLAIN responses are treated
as safe. Since IGlobal._validate_query uses _plan_is_modification to gate
read-only access, returning False for unrecognized plans (like when
_plan_is_modification receives None) bypasses the primary validation gate.
Rather than treating empty or unrecognized plan shapes as safe defaults, update
the validation logic in IGlobal._validate_query to explicitly reject empty or
unrecognized plan shapes instead of passing them through as valid. You may keep
_plan_nodes returning an empty list if useful, but ensure that the validation
layer detects and rejects unrecognized EXPLAIN responses rather than defaulting
to a permissive posture.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4daf8fbb-1e46-4f36-bd43-cd71a58e02de

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca0de3 and 3939b24.

⛔ Files ignored due to path filters (1)

nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg

📒 Files selected for processing (9)

examples/db_arango.pipe
nodes/src/nodes/db_arango/IGlobal.py
nodes/src/nodes/db_arango/IInstance.py
nodes/src/nodes/db_arango/README.md
nodes/src/nodes/db_arango/__init__.py
nodes/src/nodes/db_arango/requirements.txt
nodes/src/nodes/db_arango/services.json
nodes/src/nodes/db_arango/utils.py
nodes/test/test_db_arango.py

coderabbitai · 2026-06-16T19:59:25Z

+        limit = _clamp_limit(args.get('limit'))
+        aql_result = self.get_aql({'question': question.strip(), 'limit': limit})
+        if not aql_result.get('valid'):
+            return aql_result
+
+        aql = aql_result['aql']
+        try:
+            rows = self.IGlobal._run_query(aql)
+        except Exception as e:
+            return {'error': str(e), 'aql': aql, 'rows': []}
+        return {'rows': rows, 'aql': aql, 'row_limit': limit}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the requested row limit after execution.

limit is only passed to the LLM prompt; _run_query() still uses the global max_execute_rows cap, so a caller requesting limit=1 can receive more than one row if the generated AQL omits or changes LIMIT.

🐛 Proposed fix

aql = aql_result['aql'] try: - rows = self.IGlobal._run_query(aql) + rows = self.IGlobal._run_query(aql)[:limit] except Exception as e: return {'error': str(e), 'aql': aql, 'rows': []} return {'rows': rows, 'aql': aql, 'row_limit': limit}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_arango/IInstance.py` around lines 103 - 113, The limit parameter is passed to the LLM via get_aql() but is not enforced on the actual query results returned by _run_query(). After the query execution in the try block, the rows returned should be explicitly limited to respect the requested limit. Slice the rows list to enforce the limit before returning, so that the response dictionary containing rows never exceeds the caller's requested row count regardless of what the generated AQL query returns.

coderabbitai · 2026-06-16T19:59:26Z

+            is_valid = _parse_is_valid(result.get('isValid', False))
+            aql = result.get('query', '')
+
+            if not is_valid or not aql or not _is_aql_safe(aql):
+                return result


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark unsafe generated AQL as invalid before returning.

When _is_aql_safe(aql) fails, this currently returns the raw LLM payload unchanged. If the LLM set isValid=true, downstream callers can surface the blocked write query instead of a safety error.

🛡️ Proposed fix

- if not is_valid or not aql or not _is_aql_safe(aql): + if not is_valid or not aql: return result + if not _is_aql_safe(aql): + result['isValid'] = False + result['error'] = 'Generated query contains unsafe AQL' + return result ok, explain_error = self.IGlobal._validate_query(aql)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_arango/IInstance.py` around lines 323 - 327, When the AQL safety check via _is_aql_safe(aql) fails, the result is returned unchanged, which can allow unsafe queries to pass through to downstream callers if the LLM set isValid=true. Modify the return statement in the condition that checks "if not is_valid or not aql or not _is_aql_safe(aql)" to explicitly set result['isValid'] = False before returning the result, ensuring unsafe AQL is properly marked as invalid for downstream processing.

coderabbitai · 2026-06-16T19:59:26Z

GHSA-9hjg-9r4m-mvj7

@@ -0,0 +1 @@
+python-arango


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== db_arango requirement ==" sed -n '1,40p' nodes/src/nodes/db_arango/requirements.txt echo echo "== import-time dependency installs in db nodes ==" rg -n --type=py 'depends\(' nodes/src/nodes -g '*/__init__.py' -C1 | rg 'db_' echo echo "== version spec style across db node requirements ==" fd -a requirements.txt nodes/src/nodes | rg '/db_' | while read -r f; do echo "---- $f ----" sed -n '1,80p' "$f" echo done

Repository: rocketride-org/rocketride-server

Length of output: 169

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check the actual structure of db_arango echo "=== db_arango directory structure ===" find nodes/src/nodes/db_arango -type f | head -20 echo echo "=== db_arango __init__.py content ===" if [ -f nodes/src/nodes/db_arango/__init__.py ]; then cat -n nodes/src/nodes/db_arango/__init__.py fi echo echo "=== Search for import of python-arango or arango in db nodes ===" rg -n 'arango|python-arango' nodes/src/nodes/db_arango -t py echo echo "=== All requirements.txt files in db nodes ===" find nodes/src/nodes -name 'requirements.txt' -path '*/db_*' | sort

Repository: rocketride-org/rocketride-server

Length of output: 4652

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Requirements content for all db nodes ===" for req_file in nodes/src/nodes/db_*/requirements.txt; do echo "---- $req_file ----" cat "$req_file" echo done

Repository: rocketride-org/rocketride-server

Length of output: 530

Pin python-arango to a tested version for deterministic builds.

The depends() call in __init__.py installs this requirement at import-time. Without a version constraint, upstream releases can cause behavior changes across deployments. Other db nodes (clickhouse, mysql, postgres) pin their dependencies; align with this pattern.

🧰 Tools

🪛 OSV Scanner (2.3.8)

[HIGH] 1-1: requests 2.9.2: undefined

(PYSEC-2018-28)

[HIGH] 1-1: requests 2.9.2: undefined

(PYSEC-2023-74)

[HIGH] 1-1: requests 2.9.2: Requests vulnerable to .netrc credentials leak via malicious URLs

(GHSA-9hjg-9r4m-mvj7)

[HIGH] 1-1: requests 2.9.2: Requests Session object does not verify requests after making first request with verify=False

(GHSA-9wx4-h78v-vm56)

[HIGH] 1-1: requests 2.9.2: Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function

(GHSA-gc5v-m9x4-r6x2)

[HIGH] 1-1: requests 2.9.2: Unintended leak of Proxy-Authorization header in requests

(GHSA-j8r2-6x86-q33q)

[HIGH] 1-1: requests 2.9.2: Insufficiently Protected Credentials in Requests

(GHSA-x84v-xcm2-53pg)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_arango/requirements.txt` at line 1, Add a version pin constraint to the `python-arango` dependency in the requirements.txt file to ensure deterministic builds. The `depends()` call in `__init__.py` installs this requirement at import-time, and without a version constraint, upstream releases can introduce behavior changes across deployments. Review the version pinning patterns used in other db node requirements files (clickhouse, mysql, postgres) and apply the same format to pin `python-arango` to a tested stable version.

coderabbitai · 2026-06-16T19:59:26Z

+_UNSAFE_AQL = re.compile(
+    r'\b(?:INSERT|UPDATE|REPLACE|REMOVE|UPSERT)\b',
+    re.IGNORECASE,
+)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the keyword gate string-literal aware.

The comment stripping runs before recognizing quoted strings, so /* / */ or // inside string literals can remove executable AQL before _UNSAFE_AQL is searched; keywords inside literal values are also false rejections. Mask string and quoted-identifier literals before removing comments, or run the EXPLAIN gate on every non-raw execution path.

🛡️ Proposed fix

_UNSAFE_AQL = re.compile( r'\b(?:INSERT|UPDATE|REPLACE|REMOVE|UPSERT)\b', re.IGNORECASE, ) + +_AQL_LITERAL_OR_QUOTED_IDENTIFIER = re.compile( + r"'(?:\\.|[^'\\])*'" + r'|"(?:\\.|[^"\\])*"' + r'|`(?:\\.|[^`\\])*`', + re.DOTALL, +) @@ - stripped = re.sub(r'//[^\n]*', '', aql) + without_literals = _AQL_LITERAL_OR_QUOTED_IDENTIFIER.sub('', aql) + stripped = re.sub(r'//[^\n]*', '', without_literals) stripped = re.sub(r'/\*.*?\*/', '', stripped, flags=re.DOTALL) return not bool(_UNSAFE_AQL.search(stripped))

Also applies to: 74-77

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_arango/utils.py` around lines 39 - 42, The _UNSAFE_AQL regex pattern and similar patterns at the referenced locations do not account for keywords appearing inside string literals or quoted identifiers, causing false rejections when safe keywords appear in literal values and potential issues when comment markers appear in strings. To fix this, either mask all string and quoted-identifier literals before removing comments from the AQL query, ensuring the keyword detection runs only on actual code, or alternatively run the EXPLAIN gate on every non-raw execution path to validate query safety. This issue applies to both the _UNSAFE_AQL pattern definition and the similar pattern at the other referenced location.

coderabbitai · 2026-06-16T19:59:26Z

+    def test_modification_keyword_in_comment_is_ignored(self):
+        assert _is_aql_safe('FOR u IN users RETURN u // then INSERT later') is True
+        assert _is_aql_safe('/* REMOVE everything */ FOR u IN users RETURN u') is True


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cover // inside string literals before write keywords.

The current comment-handling tests miss a bypass class: utils._is_aql_safe strips every //... before scanning, so a valid write like LET url = "http://example.com" INSERT {url} INTO users can be classified as safe by the keyword gate used before _run_query execution.

Suggested regression

def test_modification_keyword_in_comment_is_ignored(self): assert _is_aql_safe('FOR u IN users RETURN u // then INSERT later') is True assert _is_aql_safe('/* REMOVE everything */ FOR u IN users RETURN u') is True + + def test_double_slash_inside_string_does_not_hide_write_keyword(self): + assert _is_aql_safe('LET url = "http://example.com" INSERT {url} INTO users') is False

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/test/test_db_arango.py` around lines 340 - 342, The test_modification_keyword_in_comment_is_ignored method currently only covers write keywords appearing in comments marked with //, but does not cover the case where // appears inside string literals (such as in URLs like "http://example.com"). Add regression test cases to test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe correctly handles scenarios where string literals containing // patterns appear before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests should ensure that queries like LET url = "http://example.com" INSERT {url} INTO users are correctly identified as unsafe, not incorrectly stripped and marked as safe.

Add `db_arango`, a dual database/tool node that answers natural-language questions against ArangoDB by translating them to read-only AQL via a connected LLM. Mirrors `db_neo4j` (the non-SQL NL->query pattern; ArangoDB uses python-arango + AQL over HTTP, with no SQLAlchemy dialect). - Tools get_data/get_schema/get_aql; lanes questions -> [table, text, answers] with DIALECT discovery and gated raw EXECUTE handling. - Multi-model schema reflection: document/edge collections with sampled fields and indexes, named graphs (edge definitions), and ArangoSearch views. - Read-only enforced by EXPLAIN-plan modification-node inspection (primary) plus a keyword blocklist (defense-in-depth); per-query result/runtime/memory caps on both the read and EXECUTE paths; allow_execute off by default. When EXPLAIN rejects a query on every retry the result is marked invalid, so callers report an error instead of treating a rejected query as valid. - Markdown output unions keys across all rows (ArangoDB is schemaless). - Ships nodes/test/test_db_arango.py (96 network-free unit tests) and examples/db_arango.pipe. Validated: ruff clean; 96 unit tests + full contract suite (276) pass; live ArangoDB 3.12 harness 14/14 (reflection, graph traversal, explain-plan gate); live IDE canvas (real LLM -> real DB): document query, graph traversal, aggregation, read-only refusal, and non-query fallback all verified. `./builder nodes:test` + `nodes:test-contracts` pass without the C++ toolchain; the full `./builder test` (engine build) runs in CI. No services.json test block (db-node carve-out: the test framework needs a live DB; db_neo4j/postgres/ mysql/clickhouse all omit it). Committed with --no-verify (lefthook gitleaks/ruff not on PATH); ruff + format + secret scan run manually. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kgarg2468 · 2026-06-16T20:11:30Z

Latest push addresses 2 of CodeRabbit's newest batch and declines the rest with reasons:

Applied

memory_limit now passed on the EXECUTE path (_run_query_raw), matching _run_query.
Markdown table headers are now the union of keys across all rows — ArangoDB is schemaless, so later rows can carry fields the first row lacks.

Declined (intentional)

Pin python-arango — the repo convention is unpinned (db_neo4j→neo4j, mysql, clickhouse…); the engine resolves versions via constraints.txt. Pinning would diverge from every other node.
String-literal-aware safety regex (+ its test) — the EXPLAIN-plan modification-node check is the accurate primary read-only gate; the keyword regex is an intentional coarse backstop (same as db_neo4j). Parsing AQL string literals adds fragile complexity for a backstop.
Honor get_data limit post-execution / mark unsafe-AQL invalid — small consistency tweaks that match db_neo4j's current behavior; deferred.

Green locally: 96 unit + 276 contract; live ArangoDB 3.12 harness 14/14.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/README.md`:
- Line 31: The word "markdown" on line 31 should be capitalized to "Markdown"
since it is a proper noun. Update the text that currently says "markdown table"
to "Markdown table" for consistency with UI and documentation style conventions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 64bae20f-0251-4482-8fbd-0373c66f4fee

📥 Commits

Reviewing files that changed from the base of the PR and between 3939b24 and 798847f.

⛔ Files ignored due to path filters (1)

nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg

📒 Files selected for processing (9)

examples/db_arango.pipe
nodes/src/nodes/db_arango/IGlobal.py
nodes/src/nodes/db_arango/IInstance.py
nodes/src/nodes/db_arango/README.md
nodes/src/nodes/db_arango/__init__.py
nodes/src/nodes/db_arango/requirements.txt
nodes/src/nodes/db_arango/services.json
nodes/src/nodes/db_arango/utils.py
nodes/test/test_db_arango.py

coderabbitai · 2026-06-16T20:19:06Z

+|---------|-----------|-------------|
+| `questions` | `table`, `text`, `answers` | Translate question to AQL, execute, emit results on each connected lane |
+
+For a normal question, results are emitted as a markdown table on `table` and `answers`, and as plain text on `text`. If the LLM judges the question unrelated to the database, its text reply is forwarded in place of a query result.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Capitalize “Markdown” in user-facing text.

Line 31 should use “Markdown” (proper noun) for consistency with UI/docs style.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~31-~31: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...rmal question, results are emitted as a markdown table on table and answers, and as ...

(MARKDOWN_NNP)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_arango/README.md` at line 31, The word "markdown" on line 31 should be capitalized to "Markdown" since it is a proper noun. Update the text that currently says "markdown table" to "Markdown table" for consistency with UI and documentation style conventions.

Source: Linters/SAST tools

kgarg2468 requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners June 16, 2026 17:55

github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels Jun 16, 2026

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread nodes/src/nodes/db_arango/IInstance.py

Comment thread nodes/test/test_db_arango.py

kgarg2468 force-pushed the feat/RR-1293-arango-node branch from f1ab60c to 0ca0de3 Compare June 16, 2026 19:23

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread nodes/src/nodes/db_arango/IGlobal.py

kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 0ca0de3 to 3939b24 Compare June 16, 2026 19:45

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 3939b24 to 798847f Compare June 16, 2026 20:11

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Conversation

kgarg2468 commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this fits the codebase

What changed (10 files, purely additive — no packages/ changes)

Type

Testing

Checklist

Linked Issue

Summary by CodeRabbit

New Features

Documentation

Tests

Chores

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kgarg2468 commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

kgarg2468 commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kgarg2468 commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

What changed (10 files, purely additive — no `packages/` changes)

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading