Branch | Tests | Coverage |
---|---|---|
master |
Hierarchical Authenticated Data Structure with Efficient Secondary Index Queries
GroveDB is a database system designed specifically for efficient secondary index queries, proofs, speed, and reliability. It was built for use within Dash Platform, but can be easily integrated into other applications for similar use.
Secondary indices are crucial to any database management system. All previous solutions had certain tradeoffs depending on the problem they were trying to solve.
Consider an authenticated data structure, like a Merkle tree built on a database of restaurants for example. Each restaurant has certain attributes, such as price and type:
struct Restaurant{
ID uint32;
name: String;
type: String;
isVegan: bool;
};
If we have say four restaurants, we might normally commit them to a Merkle tree as follows:
graph TD;
root-->A[" "];
root-->B[" "];
A-->AA["id:0"];
A-->AB["id:1"];
B-->AC["id:2"];
B-->AD["id:3"];
Querying by primary key is easy and efficient. If we have a query such as SELECT * WHERE ID <= 2;
, we can return the appropriate elements as well as construct an efficient range proof. However, querying by a secondary index is not efficient at all; it's likely that you will have to iterate over the entire structure. Consider the query SELECT * WHERE isVegan=true;
. When sorted by primary key, the vegan restaurant won't be contiguous. Not only will the proof be nontrivial, but so will the time required to find these elements.
GroveDB is a classic time-space tradeoff. It enables efficient querying on secondary indices by precomputing and committing them. A subtree of each possible queryable secondary index (up to a cap) is built and committed to our authenticated data structure. A tree of subtrees; a grove. For the same data, part of the analogous GroveDB structure might look like this:
graph TD;
root-->A["\'Restaurant\'"];
root-->B["..."];
A-->Q["ID"];
A-->W["name"];
A-->E["kind"];
A-->R["isVegan"];
Q-->Z["..."];
W-->X["..."];
E-->C["..."];
R-->Y["id:2"];
R-->U["id:1"];
R-->I["id:0"];
R-->O["id:3"];
From here, a query on the secondary index isVegan
would traverse to the subtree built for this secondary index. The items are not necessarily replicated, but referenced to.
- Efficient secondary index queries - Built specifically for and tailored to secondary index queries.
- Proofs - Supports proofs of membership, proofs of non-membership, and range proofs.
- Run anywhere - Being written in Rust, it supports all compile targets. x86, Raspberry Pis (AArch64), and Wasm. There are Node.js bindings as well.
Insertion and deletion work as you might expect, updating the respective subtrees and returning appropriate proofs of membership/nonmembership.
Instead of disjoint authenticated data structures, we opt for a unified one; a hierarchical, authenticated data structure based off of Database Outsourcing with Hierarchical Authenticated Data Structures. Elements are the most atomic pieces and can be represented in a few ways. They can be items, item references, trees, trees with items, or even trees with item references. An element contains an item, a reference to an object, or a subtree.
The trees are based off of our fork of Merk, with custom patches applied for better use with GroveDB. Merk is unique in the fact that it's an AVL tree, so the intermediary nodes also contain a key/value pair. Each node contains a third hash, the kv_hash
, in addition to the hashes of its left and right children. The kv_hash
is simply computed as kv_hash=H(key,value)
. The node hash is then computed as H(kv_hash,left_child_hash,right_child_hash)
. Merk uses Blake2B, and rs-merkle uses SHA256.
RocksDB is a key-value store, forked from LevelDB and built out by Facebook. We chose it because of its high performance, maturity, and its compatibility with our stack. Merk itself is built on top of RocksDB.
We have three types of storage: auxiliary, metadata, and tree root storage. Auxiliary storage is used to store plain key-value data which is not used in consensus. Metadata is used to store things outside of the GroveDB usage scope. Is has no prefixes, and therefore has no relation to subtrees. It lives at a higher level. Tree root storage is used to store subtrees.
A database transaction in GroveDB is a wrapper around the OptimisticTransactionDB
primitive from RocksDB. An optimistic transaction hopes on average there will be only a few conflicts, which are detected at the commit stage. This is as compared to the pessimistic model, which uses a lock.
To query GroveDB, a path and a query item have to be supplied. The path specifies the subtree, and the query item determines which nodes are selected from the subtree.
GroveDB currently supports 10 query item types:
- Key(key_name)
- Range(start..end)
- RangeInclusive(start..=end)
- RangeFull(..)
- RangeFrom(start..)
- RangeTo(..end)
- RangeToInclusive(..=end)
- RangeAfter(prev..)
- RangeAfterTo(prev..end)
- RangeAfterToInclusive(prev..=end)
This describes a basic query system: select a subtree then select nodes from that subtree. The need to create more complex queries or add restrictions to the result set may arise, which leads us to the PathQuery.
The PathQuery
allows for more complex queries with optional restrictions on the result set, i.e. limits and offsets.
PathQuery
path: [k1, k2, ..]
sized_query: SizedQuery
limit: Optional<number>
offset: Optional<number>
query: Query
items: [query_item_1, query_item_2, ...],
default_subquery_branch: SubqueryBranch
subquery_path: Optional<key>
subquery_value: Optional<Query>
conditional_subquery_branches: Map<QueryItem, SubqueryBranch>
A path is needed to define the starting context for the query.
The sized_query
determines how the result set would be restricted. It holds optional limits and offset values.
The limit
determines the maximum size of the result set and the offset
specifies the number of elements to skip before adding to the result set.
The query
object is a recursive structure - it specifies how to select nodes from the current subtree and has the option to recursively apply another query to the result set obtained from the previous query.
The items
are a collection of query items that decide which nodes to select from the current context (this builds a result set).
Before describing default_subquery_branch
and conditional_subquery_branches
, we need to define their building blocks, subquery branches:
subquery_path: Optional<Key>
subquery_value: Optional<Query>
Cases
-
subquery_path: true
,subquery_value: false
The node with the subquery path is selected and returned as the result set. -
subquery_path: false
,subquery_value: true
The query held in subquery_value is applied directly to the subtree, and the result is returned as the result set. -
subquery_path: true
,subquery_value: true
First the node with the subquery path is selected and set as new context.
Then, the subquery value is applied to this new context, and the result is returned as the result set.
The subquery branch is used on a single node but can be applied to the result set of a previous query with the use of default_subquery_branch and conditional_subquery_branches:
If this exists, the specified subquery branch is applied to every node in the result set of the previous query.
Rather than applying a subquery branch to every node in the result set, you might want to apply it to a subset of the result set. In such cases, we make use of a conditional subquery.
The conditional subquery holds a map QueryItem to SubqueryBranch.
Map<QueryItem, SubqueryBranch>
For every node in the result set, we check if there is a query item that matches it. If there is, then the associated subquery branch is applied to that node. Note that once a conditional subquery has been applied to a node, the default subquery does run on that node.
This section describes how GroveDB deals with the merging of path queries.
Mergeable path queries allow for the combination of separate path queries that do different things into a single equivalent path query.
A path query can be represented as a set of keys (path to a subtree), and a query to apply to that subtree (query can have unknown depth):
pi = [k1, k2, .., kn, Query]
Something very important to show is that a path query chain can be compressed at any point, i.e. you can turn a sequence of keys into a single query.
Consider p1 = [k1, k2, k3]. This reads as:
- From the root tree, select node with key k1
- Change the context to k1, then select the node with key k2
- Change the context to k2 and finally select the node with key k3
We can create an equivalent query to represent this, which can look like this:
Query
query k1
cond on k1
query k2
cond on k2
query k3
cond on k3
[k1, k2, k3] => [Q1], where Q1 is equivalent to the path array.
This can also be done at any point in the path array, so we can have:
[k1, k2, k3] => [k1, Q2]
[k1, k2, k3] => [K1, K2 Q3]
The path merge algorithm becomes:
- Find the common path across the path queries
- Compress each path array to a query after the common path index
- Merge the compressed query into a single query
- Return new path query with common path as path and combined query as query
Example:
p1 = [k1, k2, k3, Qa]
p2 = [k1, k2, k4, Qb]
Common path = [k1, k2]
Compress each path array after common path:
p1 = [k1, k2, Qc]
p2 = [k1, k2, Qd]
Merge compressed queries:
Qp = Qc + Qd
Return final PathQuery:
pf = [k1, k2, Qp]
GroveDB is built for use with Dash Platform, but can be easily integrated into other applications for similar use. See its use in rs-drive (example).
We currently also have bindings for Node.js. See node-grove.
First, install rustup using your preferred method.
Rust nightly is required to build, so ensure you are using the correct version.
rustup install nightly
Clone the repo and navigate to the main directory:
git clone https://github.com/dashevo/grovedb.git && cd grovedb
From here we can build:
cargo build
There is a work in progress implementation of a debugger layer for GroveDB. To use this library with
these capabilities enabled one needs to set a dependency with grovedbg
feature.
Then, to launch visualizer tool to observe the database structure inside of your browser on a port, let's say 10000, the following snippet should do:
let db = Arc::new(GroveDb::open("db").unwrap());
db.start_visualizer(10000);
Just remember to use Arc because the HTTP server might outlast the GroveDB instance.
run with cargo test
CPU | Time |
---|---|
Raspberry Pi 4 | 2m58.491s |
R5 1600AF | 33.958s |
R5 3600 | 25.658s |