Forgotten Gems in Geospatial Indexing

How an algorithm from the 80s sets the new standard for modern spatial indices

openstreetmap.org/user/daniel-

#Mapstodon #OpenStreetMap #GIS

@djh

i don't understand what the bigmin thing is, Are you talking about an in-memory search or a database (on disk)?

I made my own version of software that does this. It uses a different space filling curve but the idea is the same. How I thought abt the query running really far outside the requested area: essentially you have to decide whether you want to prioritize fewer individual IO operations or less wasted IO bandwidth.

Since data on disks is laid out as one sequence, typically it's faster for a disk to read a little bit more data if it only has to do one single sequential scan, as opposed to reading a bunch of different little bits and pieces.

My solution simply downsampled the curve until It was no sweat for the computer to make a list of every single point on the curve within the queried area. From there, I settled on an algorithm that looked at the points on the curve and decided how many segments it would split them into. If the distance between two points was larger than the queried area, then I wouldnt join those two points into a contiguous segment, I would split the segments at that point.

The cool thing: this 1 rule perfectly expresses the trade-off between how many IO operations and how much wasted bandwidth. And it's tweakable at query time, not at indexing time. So you can set a coefficient for that threshold depending on the performance characteristics of your disk when handling your given data set.

sequentialread.com/building-a-

@forestjohnson The paper explains where to look on a z-curve to find all items within a bbox.

Imagine points in a street network.

To index you sort the points based on the z value of their location.

To query the index with a bbox you
1. Find the first z value (bbox minimums)
2. Find the last z value (bbox maximums)
3. All points are in this range but due to jumps in the z curve it might contain irrelevant data
4. You use the trick in the paper to get rid of the irrelevant data ranges

@forestjohnson It's a way to efficiently compute all the ranges in the z order sorted memory you need to look at to find points within a bbox.

In the easy cases (first image in the post) you might need to only look at:

points[i : i+16]

In the cases where the bbox crosses a z curve discontinuity (second image in the post) you might need to look at multiple ranges:

points[i : i+8]
points[j : j+8]

The paper explains how to figure this out effectively and efficiently.

Follow

@djh I'm starting to remember this now.. The z curve has slightly more discontinuities iirc but its also easier to find the sub sections. My method is kinda "brute force" based on sampling a reduced resolution version of the curve in order to decide where to split up the sections before running the actual DB queries.

But in my opinion, the most important thing to call out about all of this is that you don't need different database software / you don't need to modify the database at all. The app that talks to the database just needs to Use this "one weird trick" to generate keys and query ranges.

· Edited · · 1 · 0 · 0

@forestjohnson Exactly on point!

The data storage could be in memory like in my case. You can do it with a relational database. In distributed systems running range scans in parallel.

For me the key realizations are

1) The z values are just an implicit representation of a space partitioning tree: in 2d a quad-tree

2) This z index is simple and easy to build and is highly efficient. And less-known to geo folks.

@forestjohnson What's pretty wild is I really needed this for my routing engine project where I sort graph nodes based on their location z value already for better memory locality while traversing the graph (think: dijkstra).

Now this spatial index is pretty much completely free since nodes are sorted already by their location z value. Wild.

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.