Follow

@djh@chaos.social

i don't understand what the bigmin thing is, Are you talking about an in-memory search or a database (on disk)?

I made my own version of software that does this. It uses a different space filling curve but the idea is the same. How I thought abt the query running really far outside the requested area: essentially you have to decide whether you want to prioritize fewer individual IO operations or less wasted IO bandwidth.

Since data on disks is laid out as one sequence, typically it's faster for a disk to read a little bit more data if it only has to do one single sequential scan, as opposed to reading a bunch of different little bits and pieces.

My solution simply downsampled the curve until It was no sweat for the computer to make a list of every single point on the curve within the queried area. From there, I settled on an algorithm that looked at the points on the curve and decided how many segments it would split them into. If the distance between two points was larger than the queried area, then I wouldnt join those two points into a contiguous segment, I would split the segments at that point.

The cool thing: this 1 rule perfectly expresses the trade-off between how many IO operations and how much wasted bandwidth. And it's tweakable at query time, not at indexing time. So you can set a coefficient for that threshold depending on the performance characteristics of your disk when handling your given data set.

sequentialread.com/building-a-

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.