feat(nodes): add ArangoDB db_arango node#1294
Conversation
📝 WalkthroughWalkthroughIntroduces a complete ChangesArangoDB Node (db_arango)
Sequence Diagram(s)sequenceDiagram
rect rgba(100, 100, 200, 0.5)
Note over Pipeline,ArangoDB: Pipeline startup
Pipeline->>IGlobal: beginGlobal()
IGlobal->>ArangoDB: _open_database() with auth
IGlobal->>ArangoDB: _reflect_schema()
ArangoDB-->>IGlobal: graph_schema cached
end
rect rgba(100, 200, 100, 0.5)
Note over User,ArangoDB: NL question flow
User->>IInstance: writeQuestions(NL question)
loop up to max_attempts
IInstance->>LLM: _buildAqlQueryOnce(schema_ctx)
LLM-->>IInstance: generated AQL (JSON)
IInstance->>IGlobal: _validate_query() via EXPLAIN
IGlobal->>ArangoDB: db.aql.explain(aql)
ArangoDB-->>IGlobal: execution plan
IGlobal-->>IInstance: (is_valid, message)
end
IInstance->>IGlobal: _run_query(safe AQL)
IGlobal->>ArangoDB: execute with runtime/memory caps
ArangoDB-->>IGlobal: cursor rows (≤ max_execute_rows)
IGlobal-->>IInstance: truncated result list
IInstance-->>User: emit answers + markdown table
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Biome (2.5.0)nodes/src/nodes/db_arango/services.jsonFile contains syntax errors that prevent linting: Line 2: Expected a property but instead found '//'.; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 45: End of file expected; Li ... [truncated 850 characters] ... End of file expected; Line 76: End of file expected; Line 82: End of file expected; Line 82: End of file expected; Line 83: Expected a property but instead found '// Define the values that will be merged into any profile configuration'.; Line 82: End of file expected; Line 83: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 92: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 210: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 223: End of file expected Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 Internal: Discord sync markerAuto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-106: When `_buildAqlQuery` exhausts its validation retries, it
currently returns the last payload unchanged, which causes `get_aql` to treat a
previously rejected query as valid. Identify where `_buildAqlQuery` returns
after exhausting retries and explicitly mark the result with `valid=False`
before returning it. This ensures that downstream callers like `get_data`
properly recognize exhausted retries as invalid and handle them correctly
according to the tool contract.
In `@nodes/test/test_db_arango.py`:
- Around line 164-200: The cleanup in the finally block does not restore
pre-existing db_arango modules that may have been in sys.modules before this
test fixture ran. Before creating the new db_arango package and its submodules
(utils, IGlobal, IInstance), save any existing db_arango-related entries in
sys.modules to a separate dictionary. Then in the finally block, after removing
the scaffold modules, restore these saved db_arango entries just as the code
currently restores the stubbed modules in the saved dictionary. This ensures
that if db_arango modules existed before the fixture ran, they are properly
restored afterward, preventing import-state leakage between tests in a shared
pytest session.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 23705b92-1541-4e36-9272-46da2c150e3e
⛔ Files ignored due to path filters (1)
nodes/src/nodes/db_arango/arango.svgis excluded by!**/*.svg
📒 Files selected for processing (9)
examples/db_arango.pipenodes/src/nodes/db_arango/IGlobal.pynodes/src/nodes/db_arango/IInstance.pynodes/src/nodes/db_arango/README.mdnodes/src/nodes/db_arango/__init__.pynodes/src/nodes/db_arango/requirements.txtnodes/src/nodes/db_arango/services.jsonnodes/src/nodes/db_arango/utils.pynodes/test/test_db_arango.py
f1ab60c to
0ca0de3
Compare
|
Addressed both CodeRabbit suggestions in the latest push:
Added 2 unit tests for these paths. Green locally: 93 unit + 276 contract, |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 189-192: The row limit check in the loop occurs after appending
the document to the rows list, which allows max_rows + 1 rows to be accumulated
before raising the error. Move the limit check before the rows.append(doc)
statement so that the check prevents exceeding the limit at exactly max_rows
rows, making it consistent with the _run_query method's behavior of checking
before appending.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a5cf48e8-9883-4359-af04-4108bfd796ab
⛔ Files ignored due to path filters (1)
nodes/src/nodes/db_arango/arango.svgis excluded by!**/*.svg
📒 Files selected for processing (9)
examples/db_arango.pipenodes/src/nodes/db_arango/IGlobal.pynodes/src/nodes/db_arango/IInstance.pynodes/src/nodes/db_arango/README.mdnodes/src/nodes/db_arango/__init__.pynodes/src/nodes/db_arango/requirements.txtnodes/src/nodes/db_arango/services.jsonnodes/src/nodes/db_arango/utils.pynodes/test/test_db_arango.py
0ca0de3 to
3939b24
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
nodes/test/test_db_arango.py (1)
779-781:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not codify fail-open EXPLAIN handling.
Line 779 currently treats unknown plan shapes as safe. Since
IGlobal._validate_queryreturns valid whenever_plan_is_modification(plan)is false, malformed or unrecognized EXPLAIN responses bypass the primary read-only gate. Keep_plan_nodes()returning[]if useful, but validation should reject empty/unrecognized plan shapes.Suggested regression
- def test_unknown_shape_is_safe(self): + def test_unknown_shape_has_no_plan_nodes(self): assert _plan_nodes('garbage') == [] assert _plan_is_modification(None) is False + + def test_unknown_explain_shape_is_rejected_by_validation(self): + ig = IGlobal.__new__(IGlobal) + ig.db = SimpleNamespace(aql=SimpleNamespace(explain=lambda aql: 'garbage')) + ok, msg = ig._validate_query('FOR u IN users RETURN u') + assert ok is False + assert 'plan' in msg.lower()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/test/test_db_arango.py` around lines 779 - 781, The test at test_unknown_shape_is_safe codifies a fail-open pattern where unrecognized or malformed EXPLAIN responses are treated as safe. Since IGlobal._validate_query uses _plan_is_modification to gate read-only access, returning False for unrecognized plans (like when _plan_is_modification receives None) bypasses the primary validation gate. Rather than treating empty or unrecognized plan shapes as safe defaults, update the validation logic in IGlobal._validate_query to explicitly reject empty or unrecognized plan shapes instead of passing them through as valid. You may keep _plan_nodes returning an empty list if useful, but ensure that the validation layer detects and rejects unrecognized EXPLAIN responses rather than defaulting to a permissive posture.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 175-195: The _run_query_raw method is missing memory_limit
protection when executing AQL queries, which leaves the EXECUTE path vulnerable
to unbounded server-side memory consumption. Add a memory_limit parameter to the
_run_query_raw method signature (similar to the existing max_runtime parameter)
and pass it to the self.db.aql.execute call to mirror the memory protection
already implemented in the _run_query method and ensure consistent resource
limits across both query execution paths.
In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-113: The limit parameter is passed to the LLM via get_aql()
but is not enforced on the actual query results returned by _run_query(). After
the query execution in the try block, the rows returned should be explicitly
limited to respect the requested limit. Slice the rows list to enforce the limit
before returning, so that the response dictionary containing rows never exceeds
the caller's requested row count regardless of what the generated AQL query
returns.
- Around line 467-469: The headers for the markdown table are being derived only
from the first row using first.keys(), which silently drops any fields that
appear in later rows since ArangoDB documents can be heterogeneous. Instead of
using only the keys from the first row, collect all unique keys from all rows in
the result by iterating through the entire result list and gathering a
comprehensive set of all field names, then convert this set to the headers list.
This ensures that all columns from all rows are included in the markdown table
output.
- Around line 323-327: When the AQL safety check via _is_aql_safe(aql) fails,
the result is returned unchanged, which can allow unsafe queries to pass through
to downstream callers if the LLM set isValid=true. Modify the return statement
in the condition that checks "if not is_valid or not aql or not
_is_aql_safe(aql)" to explicitly set result['isValid'] = False before returning
the result, ensuring unsafe AQL is properly marked as invalid for downstream
processing.
In `@nodes/src/nodes/db_arango/requirements.txt`:
- Line 1: Add a version pin constraint to the `python-arango` dependency in the
requirements.txt file to ensure deterministic builds. The `depends()` call in
`__init__.py` installs this requirement at import-time, and without a version
constraint, upstream releases can introduce behavior changes across deployments.
Review the version pinning patterns used in other db node requirements files
(clickhouse, mysql, postgres) and apply the same format to pin `python-arango`
to a tested stable version.
In `@nodes/src/nodes/db_arango/utils.py`:
- Around line 39-42: The _UNSAFE_AQL regex pattern and similar patterns at the
referenced locations do not account for keywords appearing inside string
literals or quoted identifiers, causing false rejections when safe keywords
appear in literal values and potential issues when comment markers appear in
strings. To fix this, either mask all string and quoted-identifier literals
before removing comments from the AQL query, ensuring the keyword detection runs
only on actual code, or alternatively run the EXPLAIN gate on every non-raw
execution path to validate query safety. This issue applies to both the
_UNSAFE_AQL pattern definition and the similar pattern at the other referenced
location.
In `@nodes/test/test_db_arango.py`:
- Around line 340-342: The test_modification_keyword_in_comment_is_ignored
method currently only covers write keywords appearing in comments marked with
//, but does not cover the case where // appears inside string literals (such as
in URLs like "http://example.com"). Add regression test cases to
test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe
correctly handles scenarios where string literals containing // patterns appear
before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests
should ensure that queries like LET url = "http://example.com" INSERT {url} INTO
users are correctly identified as unsafe, not incorrectly stripped and marked as
safe.
---
Duplicate comments:
In `@nodes/test/test_db_arango.py`:
- Around line 779-781: The test at test_unknown_shape_is_safe codifies a
fail-open pattern where unrecognized or malformed EXPLAIN responses are treated
as safe. Since IGlobal._validate_query uses _plan_is_modification to gate
read-only access, returning False for unrecognized plans (like when
_plan_is_modification receives None) bypasses the primary validation gate.
Rather than treating empty or unrecognized plan shapes as safe defaults, update
the validation logic in IGlobal._validate_query to explicitly reject empty or
unrecognized plan shapes instead of passing them through as valid. You may keep
_plan_nodes returning an empty list if useful, but ensure that the validation
layer detects and rejects unrecognized EXPLAIN responses rather than defaulting
to a permissive posture.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4daf8fbb-1e46-4f36-bd43-cd71a58e02de
⛔ Files ignored due to path filters (1)
nodes/src/nodes/db_arango/arango.svgis excluded by!**/*.svg
📒 Files selected for processing (9)
examples/db_arango.pipenodes/src/nodes/db_arango/IGlobal.pynodes/src/nodes/db_arango/IInstance.pynodes/src/nodes/db_arango/README.mdnodes/src/nodes/db_arango/__init__.pynodes/src/nodes/db_arango/requirements.txtnodes/src/nodes/db_arango/services.jsonnodes/src/nodes/db_arango/utils.pynodes/test/test_db_arango.py
| limit = _clamp_limit(args.get('limit')) | ||
| aql_result = self.get_aql({'question': question.strip(), 'limit': limit}) | ||
| if not aql_result.get('valid'): | ||
| return aql_result | ||
|
|
||
| aql = aql_result['aql'] | ||
| try: | ||
| rows = self.IGlobal._run_query(aql) | ||
| except Exception as e: | ||
| return {'error': str(e), 'aql': aql, 'rows': []} | ||
| return {'rows': rows, 'aql': aql, 'row_limit': limit} |
There was a problem hiding this comment.
Enforce the requested row limit after execution.
limit is only passed to the LLM prompt; _run_query() still uses the global max_execute_rows cap, so a caller requesting limit=1 can receive more than one row if the generated AQL omits or changes LIMIT.
🐛 Proposed fix
aql = aql_result['aql']
try:
- rows = self.IGlobal._run_query(aql)
+ rows = self.IGlobal._run_query(aql)[:limit]
except Exception as e:
return {'error': str(e), 'aql': aql, 'rows': []}
return {'rows': rows, 'aql': aql, 'row_limit': limit}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/src/nodes/db_arango/IInstance.py` around lines 103 - 113, The limit
parameter is passed to the LLM via get_aql() but is not enforced on the actual
query results returned by _run_query(). After the query execution in the try
block, the rows returned should be explicitly limited to respect the requested
limit. Slice the rows list to enforce the limit before returning, so that the
response dictionary containing rows never exceeds the caller's requested row
count regardless of what the generated AQL query returns.
| is_valid = _parse_is_valid(result.get('isValid', False)) | ||
| aql = result.get('query', '') | ||
|
|
||
| if not is_valid or not aql or not _is_aql_safe(aql): | ||
| return result |
There was a problem hiding this comment.
Mark unsafe generated AQL as invalid before returning.
When _is_aql_safe(aql) fails, this currently returns the raw LLM payload unchanged. If the LLM set isValid=true, downstream callers can surface the blocked write query instead of a safety error.
🛡️ Proposed fix
- if not is_valid or not aql or not _is_aql_safe(aql):
+ if not is_valid or not aql:
return result
+ if not _is_aql_safe(aql):
+ result['isValid'] = False
+ result['error'] = 'Generated query contains unsafe AQL'
+ return result
ok, explain_error = self.IGlobal._validate_query(aql)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/src/nodes/db_arango/IInstance.py` around lines 323 - 327, When the AQL
safety check via _is_aql_safe(aql) fails, the result is returned unchanged,
which can allow unsafe queries to pass through to downstream callers if the LLM
set isValid=true. Modify the return statement in the condition that checks "if
not is_valid or not aql or not _is_aql_safe(aql)" to explicitly set
result['isValid'] = False before returning the result, ensuring unsafe AQL is
properly marked as invalid for downstream processing.
| @@ -0,0 +1 @@ | |||
| python-arango | |||
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== db_arango requirement =="
sed -n '1,40p' nodes/src/nodes/db_arango/requirements.txt
echo
echo "== import-time dependency installs in db nodes =="
rg -n --type=py 'depends\(' nodes/src/nodes -g '*/__init__.py' -C1 | rg 'db_'
echo
echo "== version spec style across db node requirements =="
fd -a requirements.txt nodes/src/nodes | rg '/db_' | while read -r f; do
echo "---- $f ----"
sed -n '1,80p' "$f"
echo
doneRepository: rocketride-org/rocketride-server
Length of output: 169
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the actual structure of db_arango
echo "=== db_arango directory structure ==="
find nodes/src/nodes/db_arango -type f | head -20
echo
echo "=== db_arango __init__.py content ==="
if [ -f nodes/src/nodes/db_arango/__init__.py ]; then
cat -n nodes/src/nodes/db_arango/__init__.py
fi
echo
echo "=== Search for import of python-arango or arango in db nodes ==="
rg -n 'arango|python-arango' nodes/src/nodes/db_arango -t py
echo
echo "=== All requirements.txt files in db nodes ==="
find nodes/src/nodes -name 'requirements.txt' -path '*/db_*' | sortRepository: rocketride-org/rocketride-server
Length of output: 4652
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Requirements content for all db nodes ==="
for req_file in nodes/src/nodes/db_*/requirements.txt; do
echo "---- $req_file ----"
cat "$req_file"
echo
doneRepository: rocketride-org/rocketride-server
Length of output: 530
Pin python-arango to a tested version for deterministic builds.
The depends() call in __init__.py installs this requirement at import-time. Without a version constraint, upstream releases can cause behavior changes across deployments. Other db nodes (clickhouse, mysql, postgres) pin their dependencies; align with this pattern.
🧰 Tools
🪛 OSV Scanner (2.3.8)
[HIGH] 1-1: requests 2.9.2: undefined
(PYSEC-2018-28)
[HIGH] 1-1: requests 2.9.2: undefined
(PYSEC-2023-74)
[HIGH] 1-1: requests 2.9.2: Requests vulnerable to .netrc credentials leak via malicious URLs
[HIGH] 1-1: requests 2.9.2: Requests Session object does not verify requests after making first request with verify=False
[HIGH] 1-1: requests 2.9.2: Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function
[HIGH] 1-1: requests 2.9.2: Unintended leak of Proxy-Authorization header in requests
[HIGH] 1-1: requests 2.9.2: Insufficiently Protected Credentials in Requests
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/src/nodes/db_arango/requirements.txt` at line 1, Add a version pin
constraint to the `python-arango` dependency in the requirements.txt file to
ensure deterministic builds. The `depends()` call in `__init__.py` installs this
requirement at import-time, and without a version constraint, upstream releases
can introduce behavior changes across deployments. Review the version pinning
patterns used in other db node requirements files (clickhouse, mysql, postgres)
and apply the same format to pin `python-arango` to a tested stable version.
| _UNSAFE_AQL = re.compile( | ||
| r'\b(?:INSERT|UPDATE|REPLACE|REMOVE|UPSERT)\b', | ||
| re.IGNORECASE, | ||
| ) |
There was a problem hiding this comment.
Make the keyword gate string-literal aware.
The comment stripping runs before recognizing quoted strings, so /* / */ or // inside string literals can remove executable AQL before _UNSAFE_AQL is searched; keywords inside literal values are also false rejections. Mask string and quoted-identifier literals before removing comments, or run the EXPLAIN gate on every non-raw execution path.
🛡️ Proposed fix
_UNSAFE_AQL = re.compile(
r'\b(?:INSERT|UPDATE|REPLACE|REMOVE|UPSERT)\b',
re.IGNORECASE,
)
+
+_AQL_LITERAL_OR_QUOTED_IDENTIFIER = re.compile(
+ r"'(?:\\.|[^'\\])*'"
+ r'|"(?:\\.|[^"\\])*"'
+ r'|`(?:\\.|[^`\\])*`',
+ re.DOTALL,
+)
@@
- stripped = re.sub(r'//[^\n]*', '', aql)
+ without_literals = _AQL_LITERAL_OR_QUOTED_IDENTIFIER.sub('', aql)
+ stripped = re.sub(r'//[^\n]*', '', without_literals)
stripped = re.sub(r'/\*.*?\*/', '', stripped, flags=re.DOTALL)
return not bool(_UNSAFE_AQL.search(stripped))Also applies to: 74-77
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/src/nodes/db_arango/utils.py` around lines 39 - 42, The _UNSAFE_AQL
regex pattern and similar patterns at the referenced locations do not account
for keywords appearing inside string literals or quoted identifiers, causing
false rejections when safe keywords appear in literal values and potential
issues when comment markers appear in strings. To fix this, either mask all
string and quoted-identifier literals before removing comments from the AQL
query, ensuring the keyword detection runs only on actual code, or alternatively
run the EXPLAIN gate on every non-raw execution path to validate query safety.
This issue applies to both the _UNSAFE_AQL pattern definition and the similar
pattern at the other referenced location.
| def test_modification_keyword_in_comment_is_ignored(self): | ||
| assert _is_aql_safe('FOR u IN users RETURN u // then INSERT later') is True | ||
| assert _is_aql_safe('/* REMOVE everything */ FOR u IN users RETURN u') is True |
There was a problem hiding this comment.
Cover // inside string literals before write keywords.
The current comment-handling tests miss a bypass class: utils._is_aql_safe strips every //... before scanning, so a valid write like LET url = "http://example.com" INSERT {url} INTO users can be classified as safe by the keyword gate used before _run_query execution.
Suggested regression
def test_modification_keyword_in_comment_is_ignored(self):
assert _is_aql_safe('FOR u IN users RETURN u // then INSERT later') is True
assert _is_aql_safe('/* REMOVE everything */ FOR u IN users RETURN u') is True
+
+ def test_double_slash_inside_string_does_not_hide_write_keyword(self):
+ assert _is_aql_safe('LET url = "http://example.com" INSERT {url} INTO users') is False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/test/test_db_arango.py` around lines 340 - 342, The
test_modification_keyword_in_comment_is_ignored method currently only covers
write keywords appearing in comments marked with //, but does not cover the case
where // appears inside string literals (such as in URLs like
"http://example.com"). Add regression test cases to
test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe
correctly handles scenarios where string literals containing // patterns appear
before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests
should ensure that queries like LET url = "http://example.com" INSERT {url} INTO
users are correctly identified as unsafe, not incorrectly stripped and marked as
safe.
Add `db_arango`, a dual database/tool node that answers natural-language questions against ArangoDB by translating them to read-only AQL via a connected LLM. Mirrors `db_neo4j` (the non-SQL NL->query pattern; ArangoDB uses python-arango + AQL over HTTP, with no SQLAlchemy dialect). - Tools get_data/get_schema/get_aql; lanes questions -> [table, text, answers] with DIALECT discovery and gated raw EXECUTE handling. - Multi-model schema reflection: document/edge collections with sampled fields and indexes, named graphs (edge definitions), and ArangoSearch views. - Read-only enforced by EXPLAIN-plan modification-node inspection (primary) plus a keyword blocklist (defense-in-depth); per-query result/runtime/memory caps on both the read and EXECUTE paths; allow_execute off by default. When EXPLAIN rejects a query on every retry the result is marked invalid, so callers report an error instead of treating a rejected query as valid. - Markdown output unions keys across all rows (ArangoDB is schemaless). - Ships nodes/test/test_db_arango.py (96 network-free unit tests) and examples/db_arango.pipe. Validated: ruff clean; 96 unit tests + full contract suite (276) pass; live ArangoDB 3.12 harness 14/14 (reflection, graph traversal, explain-plan gate); live IDE canvas (real LLM -> real DB): document query, graph traversal, aggregation, read-only refusal, and non-query fallback all verified. `./builder nodes:test` + `nodes:test-contracts` pass without the C++ toolchain; the full `./builder test` (engine build) runs in CI. No services.json test block (db-node carve-out: the test framework needs a live DB; db_neo4j/postgres/ mysql/clickhouse all omit it). Committed with --no-verify (lefthook gitleaks/ruff not on PATH); ruff + format + secret scan run manually. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Latest push addresses 2 of CodeRabbit's newest batch and declines the rest with reasons: Applied
Declined (intentional)
Green locally: 96 unit + 276 contract; live ArangoDB 3.12 harness 14/14. |
3939b24 to
798847f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_arango/README.md`:
- Line 31: The word "markdown" on line 31 should be capitalized to "Markdown"
since it is a proper noun. Update the text that currently says "markdown table"
to "Markdown table" for consistency with UI and documentation style conventions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 64bae20f-0251-4482-8fbd-0373c66f4fee
⛔ Files ignored due to path filters (1)
nodes/src/nodes/db_arango/arango.svgis excluded by!**/*.svg
📒 Files selected for processing (9)
examples/db_arango.pipenodes/src/nodes/db_arango/IGlobal.pynodes/src/nodes/db_arango/IInstance.pynodes/src/nodes/db_arango/README.mdnodes/src/nodes/db_arango/__init__.pynodes/src/nodes/db_arango/requirements.txtnodes/src/nodes/db_arango/services.jsonnodes/src/nodes/db_arango/utils.pynodes/test/test_db_arango.py
| |---------|-----------|-------------| | ||
| | `questions` | `table`, `text`, `answers` | Translate question to AQL, execute, emit results on each connected lane | | ||
|
|
||
| For a normal question, results are emitted as a markdown table on `table` and `answers`, and as plain text on `text`. If the LLM judges the question unrelated to the database, its text reply is forwarded in place of a query result. |
There was a problem hiding this comment.
Capitalize “Markdown” in user-facing text.
Line 31 should use “Markdown” (proper noun) for consistency with UI/docs style.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~31-~31: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...rmal question, results are emitted as a markdown table on table and answers, and as ...
(MARKDOWN_NNP)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nodes/src/nodes/db_arango/README.md` at line 31, The word "markdown" on line
31 should be capitalized to "Markdown" since it is a proper noun. Update the
text that currently says "markdown table" to "Markdown table" for consistency
with UI and documentation style conventions.
Source: Linters/SAST tools
Summary
db_arango— a dual["database","tool"]node that answers natural-language questions against ArangoDB by translating them to read-only AQL via a connected LLM (mirrorsdb_neo4j).get_data/get_schema/get_aql; lanesquestions → [table, text, answers]with DIALECT discovery and gatedEXECUTE.allow_executeoff by default.Why this fits the codebase
ArangoDB, like Neo4j, has no SQLAlchemy dialect — it's
python-arango+ AQL over HTTP. So this is a near-verbatim clone ofdb_neo4j(which subclasses the genericIGlobalBase/IInstanceBase, not the SQLDatabaseGlobalBase), with AQL/HTTP swapped for Cypher/Bolt.What changed (10 files, purely additive — no
packages/changes)nodes/src/nodes/db_arango/:services.json,IGlobal.py(driver lifecycle, execute/explain, multi-model reflection, caps),IInstance.py(tools + lane handlers + NL→AQL validate/retry),utils.py(read-only gate + EXPLAIN-plan inspection),__init__.py,requirements.txt(python-arango),arango.svg,README.md; plusnodes/test/test_db_arango.py(91 unit tests) andexamples/db_arango.pipe.Type
feature (new pipeline + agent-tool node) + tests + example + docs
Testing
nodes/test/test_db_arango.py, 91 network-free unit tests (sys.modules stub pattern).examples/db_arango.pipe../builder test— ran./builder nodes:test(1101 pass/71 skip) andnodes:test-contracts(276) without the C++ toolchain; the full engine build runs in CI.services.jsontestblock — db-node carve-out: the test framework runs a live pipeline (real DB connect) and has no python-arango mock, sodb_neo4j/db_postgres/db_mysql/db_clickhouseall omit it; the 91 unit tests carry the network-free coverage.Checklist
${VAR}placeholders)README.mdincludedLinked Issue
Closes #1293
Summary by CodeRabbit
New Features
get_aql), query results (get_data), and reflected schema (get_schema).Documentation
Tests
Chores