:boost_requested:

Question: say I have a graph (DAG, specifically) of arbitrary complexity. I want to turn this into a sequence through a deterministic(!) topological sort.

However, new parts of the graph may be discovered at any point in time, in which case the sequence needs to be updated to reflect this new part. But the full sequence is not present in memory; it is stored in a flat database, such as an RDBMS or key/value store.

What would be a reasonable sorting approach and storage format that would allow for doing this sort of 'incremental resorting', without needing to load potentially unbounded-in-size parts of the graph into memory to do so?

@joepie91 Often, a good starting point to this class of problems is to remember that even if the whole graph is at memory at once, the processor is not paying attention to it all at once anyway.

Then, the fact that some of the out-of-attention parts can be physically inside an external datamase becomes a mere caching problem.

My hunch is, you'll need to maintain the result of the topo-sort as an ascending sparse series of numbers, and after new data comes in, while incremental reordering is in progress, dirty flags on nodes affected by either the new constraints or the reshuffle, so as to avoid having to rescan the whole dataset for the termination condition. For the deterministic property to hold, you'll need to give nodes an identifier in a sortable space to secondarily sort them; depending on your context, it might be that the discovery sequence number would serve in this rôle.

Follow

@riley (Thinking about this for a while)

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.