Recursive Ordering in SQLite

I previously talked about how you could order by relationships in [[blog/recursive-ordering]]. I didn't explain how you can actually implement this in a relational database and make it performant.

Lets say we've implemented a linked list in our table and want to retrieve items in the order that they are linked together.

The schema would look like:

CREATE TABLE node (
  id primary key NOT NULL,
  parent_id,
  content,
);
CREATE INDEX node_parent_id ON node (parent_id);

But how do we fetch nodes in order? We can't order by any of the fields nor will joins or subselects help us out here.

SELECT content FROM node ORDER BY ?????

We can, however, create a recursive common table expression.

WITH RECURSIVE
after_node(content,level,id) AS (
  VALUES('Root',0,?)
  UNION ALL
  SELECT node.content, after_node.level+1, node.id
    FROM node JOIN after_node ON node.parent_id=after_node.id
   ORDER BY 2 DESC
)
SELECT content FROM after_node LIMIT ?;

What is happening above is that we're creating the ordered representation of the table by recursively following parent_id down to the provided root.

To get items after a given root, provide a different root id.

Using this query would look something like:

const stmt = sqlite.prepare(sql`
WITH RECURSIVE
after_node(content,level,id) AS (
  VALUES('Root',0,?)
  UNION ALL
  SELECT node.content, after_node.level+1, node.id
    FROM node JOIN after_node ON node.parent_id IS after_node.id
   ORDER BY 2 DESC
)
SELECT content FROM after_node LIMIT ?;
`);
 
// get the first 1k items in desc order
stmt.all(
  null, // the "root" id.
  1000 // first 1k items
);
 
// get the first 1k items after `some_id` in desc order
stmt.all(
  some_id,
  1000,
);

But how well does it perform?

Benchmarking

Given we can run SQLite in the browser these days (opens in a new tab), lets fire it up and try it out.

Now lets populate 100k rows as our test set of data where each new row points to the previous row.

Then prepare different statements to benchmarking recursive ordering vs normal ordering.

And finally, run the benchmarks.

On a macbook m1, recursive sort is about 2x slower than regular sort in all cases and between 8-12x faster than the control.

The "2x slower in all cases" is also a good sign -- it means that the perf of a recursive sort tracks that of a run of the mill sort on an indexed column by a constant factor.

Each recursive case also took the same amount of time meaning it doesn't matter from where we start fetching a range from within a large table.

While it is a 2x hit, it is still incredibly quick. On my m1 -- 10ms to return 10k rows in order from a set of 100k.

Pros

While slower than a fractional index, it is still very fast / is only a constant factor slower than a regular index.
Does not lose precision as a fractional index does. See articles/fractional-indexing
Does now blow up in storage space as a fractional index can. See articles/fractional-indexing
Inserting or moving a row only requires updating at most 3 rows. See articles/recursive-ordering

Drawbacks

The implementation shown here cannot be used in a distributed setting where nodes write without coordination. To solve this, see articles/distributed-recursive-ordering.

https://www.sqlite.org/lang_with.html#outlandish_recursive_query_examples (opens in a new tab)