Skip to content

Speed up OSM processing some more#7583

Merged
leonardehrenfried merged 19 commits into
opentripplanner:dev-2.xfrom
ibi-group:osm-speed-up
May 13, 2026
Merged

Speed up OSM processing some more#7583
leonardehrenfried merged 19 commits into
opentripplanner:dev-2.xfrom
ibi-group:osm-speed-up

Conversation

@leonardehrenfried
Copy link
Copy Markdown
Member

Summary

Just as in my previous PR, I am rewriting some hot code paths in the OSM processing code.

The optimizations boil down to:

  • replacing Streams with some manual processing, thereby lowering allocations
  • checking for tagless entities early and returning the methods: this is particularly important for instances of OsmNode, where the vast majority doesn't have any tags at all.
  • packing the coordinates of a way's line string into a tight array rather than instances of Coordinate
  • making some methods final

All of this yields about a 20-25% speed up.

@leonardehrenfried leonardehrenfried requested a review from a team as a code owner May 4, 2026 09:31
@leonardehrenfried leonardehrenfried added the !Optimization The feature is to improve performance. label May 4, 2026
@leonardehrenfried leonardehrenfried added this to the 2.10 (next release) milestone May 4, 2026
@leonardehrenfried
Copy link
Copy Markdown
Member Author

@VillePihlava @abyrd Could I interest you in reviewing this one as well?

@VillePihlava
Copy link
Copy Markdown
Contributor

I can take a look

@VillePihlava VillePihlava self-assigned this May 4, 2026
@VillePihlava VillePihlava self-requested a review May 4, 2026 09:35
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 96.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.54%. Comparing base (481d2ef) to head (bb2dc02).

Files with missing lines Patch % Lines
.../java/org/opentripplanner/osm/model/OsmEntity.java 93.33% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             dev-2.x    #7583   +/-   ##
==========================================
  Coverage      72.54%   72.54%           
- Complexity     21266    21271    +5     
==========================================
  Files           2380     2380           
  Lines          86400    86427   +27     
  Branches        8486     8489    +3     
==========================================
+ Hits           62675    62701   +26     
+ Misses         20728    20727    -1     
- Partials        2997     2999    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@VillePihlava VillePihlava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements. Should private final Map<String, String> tags; in OsmEntity be defined as non-nullable? This way the use of isTagless would always indicate a performance concern.

Comment thread application/src/main/java/org/opentripplanner/osm/model/OsmEntity.java Outdated
Comment thread application/src/main/java/org/opentripplanner/osm/model/OsmEntity.java Outdated
Comment on lines +176 to 177
@Nullable
private final Map<String, String> tags;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this non-nullable? Constantly calling the isTagless function feels redundant. Something like a static default empty map that this gets initialized to when no tags are present. Then the isTagless function can be used in places where it is necessary for performance reasons, not for null checking.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a micro(?)-optimization. If you use a Map you still have to call toLowerCase and compute the hash of the string.

Copy link
Copy Markdown
Member Author

@leonardehrenfried leonardehrenfried May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that using null is not super-idiomatic in our current code base (I'm its biggest hater) but in this case it's worth it for performance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it makes sense for performance reasons, I think its fine. Maybe a comment should be added every time it is used as a performance optimization?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to create a custom map implementation that null checks behind the scenes. Do whatever you think makes sense

Comment on lines +176 to 177
@Nullable
private final Map<String, String> tags;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it makes sense for performance reasons, I think its fine. Maybe a comment should be added every time it is used as a performance optimization?

Comment on lines +176 to 177
@Nullable
private final Map<String, String> tags;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to create a custom map implementation that null checks behind the scenes. Do whatever you think makes sense

VillePihlava
VillePihlava previously approved these changes May 8, 2026
Copy link
Copy Markdown
Contributor

@VillePihlava VillePihlava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Lets consider at some point whether the tag.toLowerCase() calls are really necessary.

geometry = GeometryUtils.getGeometryFactory().createLineString(
segmentCoordinates.toArray(new Coordinate[0])
);
geometry = GeometryUtils.makeLineString(segmentCoordinates.toArray());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only Java had compound value types... any day now.

/**
* Does the entity contain any tags at all?
*/
final boolean isTagless() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from other review comments is that the nullable tags field and this function exist largely to avoid String.toLowercase calls. As far as I know, OSM tags are case-sensitive and the vast majority of tag keys are lower case. Any non-lower-case version of a typically lower-case key would be considered a new nonstandard key and show up as such in the tag stats. The case insensitivity in OTP appears to be an adaptation to poor data quality in the early era of OTP, and may date all the way back to #500. I suspect OSM data quality has increased significantly in the intervening 15 years.

Would it be possible to performance-test a version using a canonical empty map instead of null and eliminating all toLowerCase calls and all null checks? I do wonder if it would be just as fast and significantly simpler.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this appears to bother both you and @VillePihlava quite a bit, so let me give a long form explanation.

For well over a decade null was used in the OSM code to mean "no tags". I started using the empty map in my previous OSM PR #7537 but later noticed that there was a performance problem with it. For this reason, I re-established the previous state of using null. I understand that it's not super-idiomatic to how we use collections nowadays. I don't have exact numbers but I could see a small speed up when switching back to null.

Besides, it's a well-encapsulated variable and the null never escapes this class.

Your point about lowercasing is correct, but when I tried switching it off I noticed that a test failed so I never pursued that avenue any further. There are a handful of tags we use like ref:IFOPT (https://www.openstreetmap.org/way/27558650) that are frequently written in the incorrect case and the lowercasing is still needed somewhere.

It could probably be removed in the hottest inner loop, but it requires some risk management and careful investigation.

In any case, that would be a functional change that I would like to keep separate from a performance optimization that aims to keep everything as it was before.

Given all of the above, I politely request that you let me keep the method/null value as-is.

If you feel that it ought to be done, I can do that in a follow-up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more aspect to isTagless: it is used in some complex methods that check a variety of tags to short-circuit the computation. This is a significant speed-up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, use of null references doesn't particularly bother me. My review was a comment, not a request for changes to your PR. I had the impression complexity was being introduced just to continue supporting case-insensitive behavior in a place that is by specification supposed to be case-sensitive, and if the case-insensitive behavior was unnecessary maybe the problem could be eliminated. Since you've already considered this and decided case-insensitive behavior still needed, no change is necessary.

@leonardehrenfried leonardehrenfried added this pull request to the merge queue May 13, 2026
Merged via the queue into opentripplanner:dev-2.x with commit 030eabb May 13, 2026
8 checks passed
@leonardehrenfried leonardehrenfried deleted the osm-speed-up branch May 13, 2026 17:30
t2gran pushed a commit that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

!Optimization The feature is to improve performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants