Comparison to Kappa-db? #228
Replies: 4 comments
-
Sure! I'll update README eventually, but here's a start. Please let me know if I got anything wrong about kappa-db! The short answer is "Earthstar is like CouchDB; Kappa-db is like SSB" Data structure & mutabilityKappa-db is a bundle of append-only logs (hypercores), one per author per device. It builds indexes by processing messages from the logs, in order, to build up a reduced state. The logs grow forever. Messages can't be edited, instead you put more messages on the end that can modify the reduced state. Messages can probably be missing, using hypercore sparse mode? So maybe old data can be deleted that way. Earthstar is a key-value database (but we use the words "path" and "document instead). It has fewer guarantees than Kappa-db and more flexibility. You can hold any subset of the documents, sync them in any order, do partial sync, drop ones you don't want. Documents can be overwritten with newer versions. There's built-in access control so multiple authors can be allowed or forbidden to mutate the same documents. Keys, Identities, Devices, Multi-writerBoth use ed25519 keys for identity and signatures. Kappa-db has one identity per device. Earthstar has one identity per author (across multiple devices). Kappa-db and Earthstar both allow multi-writer (multiple authors putting data into the same space). Universes of data, and how people join themEach kappa-db (bundle of logs) is a separate unrelated universe of data, and it's named by ... the swarm key? Anyone with the swarm key can host or write to it? Earthstar is split up into "workspaces" which are separate universes of data. Anyone with the workspace address can host or write to it. Soon there will also be invite-only workspaces -- anyone can host, but you need the secret key to write. IndexingKappa-db apps are responsible for writing their own indexer / reducer. It's more work to write that code, but you can customize the index for the complex queries your app needs. If the messages are patch-style and build on each other, you need the whole history for it to make sense. If the messages are more standalone in their design you could safely drop some using sparse mode, if you wanted to. In Earthstar, the core library does basic indexing for you and provides a way to query your documents based on their properties. It's NoSQL style (think MongoDB). It assumes documents are standalone and independent, not patches that build on each other and need to be reduced. SyncingKappa-db uses hyperswarm to find peers and the hypercore protocol to sync data, with some custom stuff to handle multiple hypercores. Earthstar is not well developed here. It can connect to cloud peers over HTTP for syncing, in the style of SSB pubs (see earthstar-pub). I'm planning to add hyperswarm also for direct p2p connections. Kappa-db relies on "logs that sync" as the underlying abstraction. Earthstar relies on a specification for "signed versioned documents". ConflictsKappa-db can't have low-level conflicts because each person (and device) has their own feed. Maybe the feeds disagree about something; that's up to the app's indexing code to figure out. Earthstar resolves conflicts with a very simple last-write-wins rule, but it keeps the conflicting document versions so apps can do something fancier if they want, or ask the user what to do. Clocks and timestampsLow-level kappa-db will work with clock skew but the app might rely on timestamps in other ways (e.g. cabal sorts messages by timestamp?) Earthstar won't work well if peers have very inaccurate clocks. It refuses to sync documents from more than 10 minutes in the future. This could be loosened; it could be fixed if we require each path to be restricted to one author, and forbid paths that anyone can write to. (Details) Maturity & adoptionKappa-db is medium maturity; hypercore is high maturity and widely used. Earthstar is new and nobody is using it yet :) |
Beta Was this translation helpful? Give feedback.
-
Awesome thanks @cinnamon-bun! Thorough review!
Multiple authors can be made per Kappa-db using
See above.
Yes that's right, sparse mode gives one the power of deletion! Editing a k/v is also possible by using a materialized view. See http://npmjs.com/unordered-materialized-kv for an example, but there are other approaches as well!
Applications like Cabal, Mapeo, Cobox, Sonar which are built on kappa-db aren't using patches. See Automerge/Hypermerge for an approach in which patches are used. Kappa-db apps usually use leveldb in production which also has a query interface -- but I really like that Earthstar has a sqlite adapter! Would be very cool to pull that into kappa world :)
With this PR, (kappa-db/kappa-core#14) kappa-db will no longer require the use of hypercore. Hyperswarm isn't required in kappa-db's dependencies at all, since hypercore can work over any Node.js stream (e.g., we aren't using Hyperswarm with kappa-db in Mapeo :))
Some kappa apps have relied on timestamps, others rely on back-links (a DAG-like approach). Because hypercore has a sequence defined already, wall-clock timestamps ought to only be used if there's a conflict, but even then, you can sort of get a lamport clock for free in that regard!
I know this is unrelated to the topic, but does Earthstar guard against very old messages? In practice with our work in the field, devices that go offline will sometimes more likely revert to some date far in the past, like some day in 2017 (when the phone was born, perhaps? :p) or 3 November 1971 ;p
|
Beta Was this translation helpful? Give feedback.
-
Thanks @okdistribute ! Just thinking through a few more details here:
Aha, right! The unique thing about Earthstar is one author can use the same identity on multiple devices. You don't have to worry about forking your feed because it's not a feed. Yes a Kappa-db is the equivalent of a Workspace, they're both a "collection of people's feeds" / a "unit of community".
Nice! Is sparse mode deletion under the control of the reader, not the author? (Maybe unless the app has a special message type which is a deletion request.) And in Earthstar an author can overwrite their own data with an empty document, and separately a reader can choose to locally delete documents (called "forgetting"). Earthstar really wants to physically delete data whenever possible, for privacy.
Huh! Maybe Earthstar could work as an alternative backend for kappa-next? It might break some assumptions though:
I think both of those cases could already happen in kappadb sparse mode, so maybe it would work!
Thanks for asking, I'm starting to realize how often device clocks can be unreliable. How do you handle this in your work? Can the device at least store previous timestamps so when it reboots it can continue where it left off instead of resetting to 1970? Earthstar has a minimum allowed timestamp which is in 1970. It doesn't inherently limit documents by relative time like N days ago, but you can filter them that way when syncing. The big problem is that the devices in 1970 won't accept documents "from the future" e.g. the devices with accurate clocks. I'll probably add options to help with inaccurate clocks, which will come with some caveats. |
Beta Was this translation helpful? Give feedback.
-
Yeah, the app would have to create a special 'tombstone' request.. cobox is interested in implementing this for hyperdrive-backed kappa-db, called kappa-drive; Mapeo has this in production, although we don't give user the ability to clear histories.. yet. One day we plan on implementing sparse mode into Mapeo, but the datasets people are working with aren't quite big enough yet to make that a necessity (minus media files, just talking the dataset).
This is a great idea!
Yeah, messages could arrive out of order or be missing from the same author already with the sparse implementation of hypercore -- that really just matters on how you write your index. See https://github.com/frando/kappa-sparse-indexer for an implementation, although this is still a relatively new approach! It seems to work for cobox and sonar though. cc @Frando
Good idea. Many researchers in this space say to never use wall clocks. I agree with them! Although it can be useful and/or important for the user to know the wall clock time, for usability purposes. In those cases, you could recommend using a timeserver if users are online? A DAG approach can cause performance issues due to having to walk the tree, and a vector clock has disk space/network tradeoffs. We had a conversation about this in cabal-core which has some insights but still remains unresolved. So I think it really depends on your use case, if users will have lots of space and not super low bandwidth, a vector clock seems like a good bet. I really like this talk by @jlongster! https://www.dotconferences.com/2019/12/james-long-crdts-for-mortals. (EDIT: Mapeo uses a DAG https://www.npmjs.com/package/unordered-materialized-kv) |
Beta Was this translation helpful? Give feedback.
-
What's the problem you want solved?
Hi, I see in the readme you compare to DAT, but it might be best to split out the comparison a bit to be more accurate, since the ecosystem is quite large.
Is there a solution you'd like to recommend?
Hyperspace will be the new RPC module for creating applications that are compatible in the dat ecosystem https://github.com/hyperspace-org. This has the same concerns you note with Hypercore, multi-writer is not possible out of the box and it is a bit more complex to do that.
Kappa-db (github.com/kappa-db/) is quite close to Earthstar, but it is less 'batteries-included' and more for customizing database behaviors. I really like the approach earthstar has taken to make these patterns more accessible to the common dev!
Thanks ~K
Beta Was this translation helpful? Give feedback.
All reactions