Skip to content

Meta: GFQL Testing & Verification Roadmap #871

@lmeyerov

Description

@lmeyerov

GFQL Testing & Verification Roadmap

This meta-issue tracks follow-up work to achieve production-grade and publication-quality testing/verification for GFQL.

Context

PRs #846 (cudf same-path executor) and #852 (Alloy verification) add significant GFQL functionality. This roadmap extends testing to meet:

  • Production standards: Conformance suites, metamorphic testing, fuzz testing
  • Publication standards: Formal verification coverage for SIGMOD/VLDB-tier claims

Bugs Found in PR #846 (Lessons for Verification)

During development of the native vectorized executor in PR #846, several algorithmic bugs were discovered that formal verification could have caught earlier. These inform what the Alloy model should cover.

Bug 1: Backward Traversal Join Direction (commit 71cda418)

Location: _find_multihop_start_nodes in df_executor.py:754

Bug: When doing backward BFS to find which start nodes can reach end nodes, the join was on the wrong column:

# WRONG: joined on __to__ (predecessor) instead of __from__ (target)
new_frontier = edge_pairs.merge(frontier, left_on='__to__', ...)

# CORRECT: join on __from__ to find predecessors
new_frontier = edge_pairs.merge(frontier, left_on='__from__', ...)

Symptom: Multi-hop + WHERE queries returned wrong nodes or empty results.

Alloy gap: The model's AlgoBackward doesn't explicitly verify join direction semantics.

Recommendation: Add assertion that backward reachability computes { n | ∃ path n →* target } not { n | ∃ path target →* n }.

Bug 2: Empty Set Short-Circuit Missing (commit 71cda418)

Location: _materialize_filtered in df_executor.py:1520

Bug: If any step in the chain had an empty allowed node set (e.g., WHERE filtered out all nodes), the executor didn't short-circuit to empty result.

# FIX: Early return when any node step has empty set
if path_state.allowed_nodes:
    for node_set in path_state.allowed_nodes.values():
        if node_set is not None and len(node_set) == 0:
            return empty_result

Symptom: 4-step chains with impossible WHERE returned nodes instead of empty.

Alloy gap: Model implicitly handles via no quantifier but doesn't assert materialization behavior.

Recommendation: Add assertion: (some i: NodeStep | AlgoAllowed[c, i] = none) implies AlgoOutN[c] = none

Bug 3: Wrong Node Source for Non-Adjacent WHERE (commit 71cda418)

Location: _apply_non_adjacent_where_post_prune in df_executor.py:410

Bug: Used alias_frames (populated during forward pass) instead of original graph nodes for value lookups. Forward pass may not populate all aliases.

# WRONG: alias_frames can be incomplete
left_values = alias_frames.get(left_alias)

# CORRECT: Use original graph's node DataFrame
nodes_df = self.inputs.graph._nodes

Symptom: Non-adjacent WHERE comparisons failed silently or used stale values.

Alloy gap: lowerWhere doesn't model where attribute values come from.

Recommendation: Clarify that WHERE value lookups use the full graph's node attributes, filtered to allowed nodes.

Bug 4: Multi-hop Path Tracing Through Intermediates (commit cd579363)

Location: _filter_multihop_edges_by_endpoints in df_executor.py:700

Bug: Backward prune filtered edges by dst in allowed_end_nodes, but for multi-hop, START connects to INTERMEDIATE nodes, not END directly.

Symptom: Start nodes showed as empty for multi-hop chains.

Alloy gap: Hop ranges not modeled at all.

Recommendation: High priority - Add hop range modeling to Alloy (see Sub-Issue #2).

Bug 5: Reverse/Undirected Edge Direction Handling (commit cd579363)

Location: Multiple functions in backward prune phase

Bug: is_reverse and is_undirected flags weren't correctly propagating through path tracing. For reverse edges, src/dst semantics are swapped.

Symptom: Reverse edge chains returned wrong or empty results.

Alloy gap: Model assumes forward edges only (implicit e.src → e.dst).

Recommendation: Add EdgeStep.direction attribute (forward/reverse/undirected) and verify directional semantics.

Tests Added

10 new tests in TestImpossibleConstraints class (test_df_executor_inputs.py):

  • test_contradictory_lt_gt_same_column
  • test_contradictory_eq_neq_same_column
  • test_contradictory_lte_gt_same_column
  • test_no_paths_satisfy_predicate
  • test_multihop_no_valid_endpoints
  • test_contradictory_on_different_columns
  • test_chain_with_impossible_intermediate
  • test_non_adjacent_impossible_constraint
  • test_empty_graph_with_constraints
  • test_no_edges_with_constraints

Sub-Issues

1. Metamorphic Testing for GFQL

Priority: P2

Add query equivalence testing under graph/query transformations:

  • Graph isomorphism: query(G) ≡ query(G') where G' is isomorphic to G
  • Query partitioning: WHERE x > 5(WHERE x > 5 AND x > 3)
  • Subgraph preservation: Results on subgraph ⊆ results on full graph

Reference: Gamera (VLDB 2024) - Graph-aware metamorphic relations

2. Extend Alloy Model to Cover Hop Ranges

Priority: P1 ⬆️ (upgraded based on bugs found)

Current Alloy model (alloy/gfql_fbf_where.als) does NOT formally verify:

  • min_hops / max_hops traversal bounds
  • output_min_hops / output_max_hops slicing
  • Hop labeling (label_node_hops, label_edge_hops, label_seeds)
  • NEW: Backward traversal join direction for multi-hop
  • NEW: Intermediate node reachability (start → intermediate → end)

Options:

  • A) Explicit modeling: Add hop counter to path semantics, verify bounds
  • B) Unrolling approach: Generate fixed-length chain variants, verify each
  • C) Abstraction: Prove hop filtering is a sound post-filter on verified core

Rationale for P1: Bugs #1 and #4 above were in multi-hop logic that formal verification should catch.

3. Add Backward Reachability Assertions to Alloy

Priority: P1 (new)

Verify that AlgoBackward correctly computes predecessors:

assert BackwardReachabilityCorrect {
  all c: Chain, n: Node, target: Node |
    n in AlgoBackward[c, target] iff
    (some path: seq Edge | pathConnects[path, n, target])
}

This would have caught Bug #1 (join direction).

4. Add Empty Set Propagation Assertion

Priority: P2 (new)

Verify that if any step has an empty allowed set, the output is empty:

assert EmptyPropagation {
  all c: Chain |
    (some i: seq/inds[c.seqSteps] | AlgoAllowed[c, i] = none)
    implies (AlgoOutN[c] = none and AlgoOutE[c] = none)
}

This would have caught Bug #2.

5. Add Contradictory WHERE Scenarios

Priority: P2 (new)

Verify that contradictory WHERE clauses produce empty results:

pred ContradictoryWhere[c: Chain] {
  some w1, w2: c.wheres |
    w1.lhs.a = w2.lhs.a and w1.rhs.a = w2.rhs.a and
    ((w1.op = Lt and w2.op = Gt) or (w1.op = Eq and w2.op = Neq))
}

assert ContradictoryWhereEmpty {
  all c: Chain | ContradictoryWhere[c] implies AlgoOutN[c] = none
}

6. Fuzz Testing for Chain + WHERE Combinations

Priority: P3

Random generation of:

  • Graph structures (nodes, edges, types, values)
  • GFQL chains (varying length, directions, hop params)
  • WHERE clauses (random alias pairs, operators, column refs)

Compare cuDF executor vs oracle on generated inputs.

Tools: Hypothesis (Python), custom generators

7. Cypher/GQL Conformance Suite

Priority: P3

Work toward openCypher TCK compatibility:

  • Map GFQL operations to Cypher equivalents
  • Adapt TCK scenarios for GFQL surface syntax
  • Track coverage percentage

Note: GFQL currently lacks Cypher surface syntax; this is aspirational.

Reference: Neo4j GQL Conformance


Current Coverage

Category Status Location
Hop ranges (parity) ✅ 11+ tests test_enumerator_parity.py
WHERE (parity) ✅ 50+ tests test_df_executor_inputs.py
Hop + WHERE composition ✅ 20+ tests test_df_executor_inputs.py
Impossible constraints ✅ 10 tests test_df_executor_inputs.py
Alloy (WHERE lowering) ✅ Verified alloy/gfql_fbf_where.als
Alloy (hop ranges) ❌ Not modeled -
Alloy (backward reachability) ❌ Not explicit -
Alloy (empty propagation) ❌ Implicit only -

References

Labels

testing, verification, gfql, roadmap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions