Syncing Data
The base primitives around which everything is built are tables and crrs
(conflict free replicated relations).
- Tables are just regular
sqlite
tables. crrs
aresqlite
tables which can be merged with other tables on other devices
You can have local-only data (regular tables) as well as synced data (crrs) in the same database. Anything that is not a crr
will not be synced.
For the rest of this guide we'll assume that you're making a collaborative presentation editor like Google Slides, Keynote or Powerpoint. This application will have completely local data (like selection state) and shared data (like decks, slides and slide contents).
Write the Schema
We can start by defining the base data model as normal sqlite
tables.
And then decide which of those should be synced and thus crrs
. We'll want to replicate the deck
, slide
and component
tables given those states should be shared across all devices and users. We won't replicate the tables which represent local-only state like selected_slide
and selected_component
.
Apply the Schema to the DB
Now that we have a data model sketched out we can apply it to a database. Let's import the crsqlite-wasm
module and open a database in memory. Too learn how to persist data to disk, see docs/guide-persistence.
Then apply the schema we defined earlier to the database.
Great! Let's run a few commands to make sure that worked.
As you can see, all the tables were created as expected. There are a few internal tables as well which are used by vlcn/crsqlite to track merge state. You can read more about those at docs/bits-internal-tables.
You can modify the code above and press
shift + enter
to see the results of other queries.
Writing Data
Now that we have a database with a data model we can start writing data to it. Let's start by creating a deck along with a few slides and components.
And to check on things --
But what if we need to collaborate between multiple devices? We need to be able to replicate the data between devices. Let's see how that works.
Syncing, Merging, Replicating
The main primitive for this is docs/crsql_changes which allows you to:
- pull changes since a given time
- apply changes from another db
While you can select all changes in the database, you can also narrow down to specific logical clock ranges, tables, primary keys and more.
Let us see what some of the changes in our DB looks like.
A bunch of stuff. The details don't really matter for now, just know that:
- The output of a select from
crsql_changes
can be directly fed back into an insert intocrsql_changes
- That a list of changes compresses well -- so don't worry about the repeated values
To understand everything in a chageset, see docs/bits-crsql-changes-internals.
Let's see what happens when we apply a changeset to a new database.
Syncing to a new DB
Spin up a new DB that has no data in it.
One constraint is that databases must have the same set of CRRs in order to sync between them. So let's apply the schema to the new database.
Confirm that db2 is in fact empty.
And finally apply all changes from the first database to the second.
Now checking the contents of db2 post sync --
return { decks, slides, components, };
We can also perform concurrent edits on each DB and then merge them together. In order to ensure we only sync what has changed between the last time we synced, we'll need to track db_version
for our peers.
crsqlite
maintains a crsql_tracked_peers table that tracks the db_version
of each peer. Let's see what that looks like for each database.
return { db1TrackedPeers, db2TrackedPeers };
You can see that tracked_peers
only has entries in db2
. This is because tracked_peers
gets updated on write, not read, and we've only synced from the direction of db1
to db2
.
Updating tracked_peers
on read is a responsibility of network layers that need to stream changes. See docs/crsql_tracked_peers.
Using Tracked Peers
The crsql_tracked_peers
table is essentially acting as a mechanism to store cursors that represent the last point in time that we synced with another peer or site. We use these values to pull changes from other sites.
For example, if db2
wants all changes from db1
that it has not yet seen, db2
can ask db1
for changes where the db_version
is greater than the value it has stored for db1
.
example --
self.db1SiteId = db1SiteId; self.lastSeenDb1Version = lastSeenDb1Version; return { db1SiteId, lastSeenDb1Version }
Now we can ask db1 for all changes since that version.
As you can see, no changes were returned. This is because we've already synced all changes from db1 to db2. If we perform a write on db1, however, we'll see that the changes are returned.
Wrap Up
You've seen how you can compose:
to build a sync layer for your application.
The basic ideas:
- Convert tables to conflict free replicated relations with
crsql_as_crr
- Discover the unique id of each database with
crsql_siteid
- Pull and apply changes to/from databases with
crsql_changes
- Keep track of what you've synced with
crsql_tracked_peers
Next Steps
This guide showed you how to do whole database replication. In future guides we'll:
- Show you how you can track and replicate only a certain set of rows
- Sync across a network
- Handle migrations
- Build a complete app
- Go over other CRDT types that you can use
- Discuss using event sourcing to power many different CDRT "views" atop the same data