Skip to content

osrm-partition produces structurally broken bisection for MLD on AWS Graviton (aarch-64) #7501

@shuyiyin-insta

Description

@shuyiyin-insta

Issue

osrm-partition produces a structurally broken bisection when run on AWS Graviton (aarch64). The failure is silent: the process exits successfully and the output .partition file is the identical byte size as a known-good run on a different host. The defect only surfaces downstream — first as elevated osrm-customize warnings, then as wrong routes from osrm-routed --algorithm=MLD.

The bug is MLD-only (CH on the same vanilla extract on the same host produces correct routes) and isolated to the partition steposrm-extract output from Graviton, when partitioned and customized on a different host, produces correct routing.

Diagnostic signal at customize time:

[warn] Level 2 unreachable boundary nodes per cell: 1.90 sources, 4.49 destinations   ← Graviton partition
[warn] Level 2 unreachable boundary nodes per cell: 0.14 sources, 0.22 destinations   ← Mac partition (same PBF)

~13–20× elevation in unreachable boundary nodes from the Graviton-built partition.

Routing symptoms with the bad partition (San Francisco / Bay Area, us-pacific PBF):

  • SF (37.7749,-122.4194) → San Jose (37.3382,-121.8863): MLD returns 401,242 km, CH returns 77.5 km (~70 km true).
  • SF → Daly City (~10 km true): MLD routes "through Idaho."
  • Honolulu → Haleiwa (~50 km true): MLD returns 250,504 km ("pinball" Hawaii→Alaska).

Some queries return NoRoute outright instead of an absurd distance.

ARM CI only landed in this repo on 2026-04-14 (#7481, #7482), so this class of bug has had no opportunity to surface in the upstream test matrix. The v6.0.0 / v26.4.x partitioner is functionally unchanged from v5.27.1 (only cosmetic clang-format / namespace churn since), so this likely affects current master too.

Related prior issues with similar customize-time signature: #5872, #6084, #5161.

Steps to reproduce

Version: v5.27.1 from the official multi-arch image ghcr.io/project-osrm/osrm-backend:v5.27.1 (linux/arm64 layer).

OSM extract: us-pacific-latest.osm.pbf from Geofabrik (any aarch64 host should reproduce on this PBF; we also reproduce on internal buffered variants of us-west and us-pacific).

Pipeline (run inside the OSRM container; default car.lua profile):

osrm-extract  -p /opt/car.lua  us-pacific-latest.osm.pbf
osrm-partition          us-pacific-latest.osrm
osrm-customize          us-pacific-latest.osrm
osrm-routed --algorithm=MLD --port 5000 us-pacific-latest.osrm

Sample queries (the queries themselves are normal; the result is wrong):

# SF -> San Jose : true ~70 km, returns ~401,242 km on a Graviton-built partition
curl 'http://localhost:5000/route/v1/driving/-122.4194,37.7749;-121.8863,37.3382?overview=false'

# SF -> Daly City : true ~10 km
curl 'http://localhost:5000/route/v1/driving/-122.4194,37.7749;-122.4702,37.6879?overview=false'

# Honolulu -> Haleiwa : true ~50 km, returns ~250,504 km
curl 'http://localhost:5000/route/v1/driving/-157.8581,21.3099;-158.1031,21.5928?overview=false'

For comparison, the same pipeline with osrm-contract instead of osrm-partition/osrm-customize, served with --algorithm=CH, returns correct distances on the same host.

Cross-host reproduction matrix (same v5.27.1 linux/arm64 binary, same PBF):

# extract host partition host customize host Honolulu→Haleiwa
1 local Mac local Mac local Mac ✓ 50 km
2 Graviton local Mac local Mac ✓ 50 km
3 Graviton Graviton local Mac ✗ NoRoute
4 Graviton Graviton Graviton ✗ 250,504 km

→ Defect is in osrm-partition running on Graviton. osrm-extract is not implicated; osrm-customize faithfully produces bad metrics from the bad partition.

I can provide the failing PBF, both partition outputs (good vs bad), and full osrm-customize logs on request.

Specifications

Library/dependency versions

  • osrm-backend v5.27.1 (multi-arch image ghcr.io/project-osrm/osrm-backend:v5.27.1, linux/arm64 layer)
  • TBB / boost / Lua: as bundled in the upstream image — not modified

Operating system

  • Container base: as in upstream Dockerfile (Debian)
  • Host on AWS: Amazon Linux 2023 (kernel 6.x), running the container under ECS

Hardware

  • Affected: AWS Graviton 4 (ARM Neoverse V2) ECS task, 4–8 vCPU under cgroup CPU quota, 12 GB memory limit. Host exposes 64 visible cores. Also reproduced on Graviton 3 (Neoverse V1).
  • Not affected: Apple M3 Pro (12 cores, no SVE), running the same linux/arm64 image via Docker Desktop.

osrm_partition_repro.zip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions