Whole CRR Sync
Whole CRR (conflict free replicated relation) sync means that we want all of the CRRs on different databases to be identical after they merge.
What follows is an example of how to implement whole CRR sync to merge two databases together.
The base primitives around which everything is built are tables and crrs.
- Tables are just regular
sqlite
tables. crrs
aresqlite
tables which can be merged with other tables on other devices
You can have local-only data (regular tables) as well as synced data (crrs) in the same database. Anything that is not a crr
will not be synced.
For the rest of this guide we'll assume that you're making a collaborative presentation editor like Google Slides, Keynote or Powerpoint. This application will have completely local data (like selection state) and shared data (like decks, slides and slide contents).
Note: the use
and provide
keywords are just part of the documentation
framework to allow code cells to communicate by exporting and importing
variables from one another.
Write the Schema
We can start by defining the base data model as normal sqlite
tables.
And then decide which of those should be synced and thus crrs
. We'll want to replicate the deck
, slide
and component
tables given those states should be shared across all devices and users. We won't replicate the tables which represent local-only state like selected_slide
and selected_component
.
Apply the Schema to the DB
Now that we have a data model sketched out we can apply it to a database. Let's import the crsqlite-wasm
module and open a database in memory. Too learn how to persist data to disk, see /docs/cr-sqlite/js/persistence.
Then apply the schema we defined earlier to the database.
Great! Let's run a few commands to make sure that worked.
As you can see, all the tables were created as expected. There are a few internal tables as well which are used by crsqlite to track merge state.
You can modify the code above and press shift + enter
to re-run the cell.
Writing Data
Now that we have a database with a data model we can start writing data to it. Let's start by creating a deck along with a few slides and components.
And to check on things --
But what if we need to collaborate between multiple devices? We need to be able to replicate the data between devices. Let's see how that works.
Syncing, Merging, Replicating
The main primitive for this is crsql_changes which allows you to:
- pull changes since a given time
- apply changes from another db
While you can select all changes in the database, you can also narrow down to specific logical clock ranges, tables, primary keys and more.
Let us see what some of the changes in our DB looks like.
A bunch of stuff. The details don't really matter for now, just know that:
- The output of a select from
crsql_changes
can be directly fed back into an insert intocrsql_changes
on another database - That a list of changes compresses well -- so don't worry about the repeated values
To understand everything in a changeset, see crsql_changes.
Let's see what happens when we apply a changeset to a new database.
Syncing to a new DB
Spin up a new DB that has no data in it.
One constraint is that databases must have the same set of CRRs in order to sync between them. So let's apply the schema to the new database.
Confirm that db2 is in fact empty.
And finally apply all changes from the first database to the second.
Now checking the contents of db2 post sync --
We can also perform concurrent edits on each DB and then merge them together.
Delta State
In order to ensure we only sync what has changed since the last time we synced, we'll need to track the latest db_version
we've seen from our peers.
crsqlite
provides a crsql_tracked_peers table that networking layers can use to track the db_version
of each peer. Networking layers are of course allowed to create their own tables to track this information as well.
Using Tracked Peers
The crsql_tracked_peers
table is essentially acting as a mechanism to store cursors that represent the last point in time that we synced with another peer or site. We use these values to only pull or push changes since the last time we pulled or pushed when talking with a peer.
If db2
wants all changes from db1
that it has not yet seen, db2
can ask db1
for changes where the db_version
is greater than the value it has stored for db1
.
example --
You can experiment with the db_version
to see how it affects the changes that are returned.
Wrap Up
You've seen how you can compose:
to build a sync layer for your application.
The basic ideas:
- Convert tables to conflict free replicated relations with
crsql_as_crr
- Discover the unique id of each database with
crsql_site_id
- Pull and apply changes to/from databases with
crsql_changes
- Keep track of what you've synced with
crsql_tracked_peers
Another example implementation of the above: https://observablehq.com/@tantaman/cr-sqlite-basic-setup (opens in a new tab)
Future
Reference implementations: