Whole CRR Sync

Whole CRR (conflict free replicated relation) sync means that we want all of the CRRs on different databases to be identical after they merge.

What follows is an example of how to implement whole CRR sync to merge two databases together.

The base primitives around which everything is built are tables and crrs.

Tables are just regular sqlite tables.
crrs are sqlite tables which can be merged with other tables on other devices

You can have local-only data (regular tables) as well as synced data (crrs) in the same database. Anything that is not a crr will not be synced.

For the rest of this guide we'll assume that you're making a collaborative presentation editor like Google Slides, Keynote or Powerpoint. This application will have completely local data (like selection state) and shared data (like decks, slides and slide contents).

Note: the use and provide keywords are just part of the documentation framework to allow code cells to communicate by exporting and importing variables from one another.

Write the Schema

We can start by defining the base data model as normal sqlite tables.

And then decide which of those should be synced and thus crrs. We'll want to replicate the deck, slide and component tables given those states should be shared across all devices and users. We won't replicate the tables which represent local-only state like selected_slide and selected_component.

Apply the Schema to the DB

Now that we have a data model sketched out we can apply it to a database. Let's import the crsqlite-wasm module and open a database in memory. Too learn how to persist data to disk, see /docs/cr-sqlite/js/persistence.

Then apply the schema we defined earlier to the database.

Great! Let's run a few commands to make sure that worked.

As you can see, all the tables were created as expected. There are a few internal tables as well which are used by crsqlite to track merge state.

You can modify the code above and press shift + enter to re-run the cell.

Writing Data

Now that we have a database with a data model we can start writing data to it. Let's start by creating a deck along with a few slides and components.

And to check on things --

But what if we need to collaborate between multiple devices? We need to be able to replicate the data between devices. Let's see how that works.

Syncing, Merging, Replicating

The main primitive for this is crsql_changes which allows you to:

pull changes since a given time
apply changes from another db

While you can select all changes in the database, you can also narrow down to specific logical clock ranges, tables, primary keys and more.

Let us see what some of the changes in our DB looks like.

A bunch of stuff. The details don't really matter for now, just know that:

The output of a select from crsql_changes can be directly fed back into an insert into crsql_changes on another database
That a list of changes compresses well -- so don't worry about the repeated values

To understand everything in a changeset, see crsql_changes.

Let's see what happens when we apply a changeset to a new database.

Syncing to a new DB

Spin up a new DB that has no data in it.

One constraint is that databases must have the same set of CRRs in order to sync between them. So let's apply the schema to the new database.

Confirm that db2 is in fact empty.

And finally apply all changes from the first database to the second.

Now checking the contents of db2 post sync --

We can also perform concurrent edits on each DB and then merge them together.

Delta State

In order to ensure we only sync what has changed since the last time we synced, we'll need to track the latest db_version we've seen from our peers.

crsqlite provides a crsql_tracked_peers table that networking layers can use to track the db_version of each peer. Networking layers are of course allowed to create their own tables to track this information as well.

Using Tracked Peers

The crsql_tracked_peers table is essentially acting as a mechanism to store cursors that represent the last point in time that we synced with another peer or site. We use these values to only pull or push changes since the last time we pulled or pushed when talking with a peer.

If db2 wants all changes from db1 that it has not yet seen, db2 can ask db1 for changes where the db_version is greater than the value it has stored for db1.

example --

You can experiment with the db_version to see how it affects the changes that are returned.

Wrap Up

You've seen how you can compose:

to build a sync layer for your application.

The basic ideas:

Convert tables to conflict free replicated relations with crsql_as_crr
Discover the unique id of each database with crsql_site_id
Pull and apply changes to/from databases with crsql_changes
Keep track of what you've synced with crsql_tracked_peers

Another example implementation of the above: https://observablehq.com/@tantaman/cr-sqlite-basic-setup (opens in a new tab)

Future

Reference implementations:

Background Partial CRR Sync