Skip to content

A usage of the WiredTiger library that splits BSON documents and stores them fully indexed.

Notifications You must be signed in to change notification settings

wiredtiger/flitiger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FLITiger

An application built using the WiredTiger library that splits BSON documents and stores them in two different index-like structures:

  • One column-like index, which stores fieldName|document ID : value
  • One row-like, which stores document ID|fieldName : value
  • It does not store a full version of the original document.

A picture describing the data representation:

Data Representation

Architecture

Architecture diagram

References

A link to the (internal to MongoDB) document accompanying this repository.

A link to the slideshow from Skunkworks:

A link to the Skunkworks presentation recording:

Installation

You need a locally built WiredTiger from source. https://github.com/wiredtiger/wiredtiger/tree/develop

You need to set an environment variable telling this project where to find WiredTiger: export WT_HOME=~/work/wiredtiger

There is a dependency on cpprestsdk (https://github.com/microsoft/cpprestsdk), which can be installed via:

sudo apt-get install libcpprest-dev

If you aren't using Ubuntu, then the above link has instructions for other platforms.

Once the dependencies are met: cd src && make

Usage

To use the library, run the generated binary.

If running in server mode, you can send requests using curl in the following form:

~/work/flitiger$ curl http://127.0.0.1:8099/test --data-binary @raw_data/rockbench_1row.json -H 'Content-Type: application/json'

Using Rockbench to populate FLITiger

Clone the Rockbench repository. Apply the diff at /raw_data/rbench.diff to the repository. Build the generator. Run against a flitiger running in server mode.

Starting up a flitiger in code:

$ cd ~/work/flitiger/src
$ make
$ ./flitiger -S

Starting up a generator in code:

$ cd ~/work
$ git clone [email protected]:rockset/rockbench.git
$ cd rockbench
$ patch -p1 < ../bson-index/raw_data/rbench.diff
$ cd generator
$ go get ./...
$ go build
$ FLITIGER_URL="http://127.0.0.1:8099/" BATCH_SIZE=2 WPS=10 DESTINATION=flitiger ./generator

Writing data to S3

Change the table configuration in the application to write level 2 and above chunks to an s3 directory:

diff --git a/src/main.cpp b/src/main.cpp
index 5a41481..73e4b41 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -147,12 +147,12 @@ int main(int argc, char **argv)
     assert(conn);
     assert(session);

-    std::string config = "type=lsm,key_format=QS,value_format=Hu";
+    std::string config = "key_format=QS,value_format=Hu,type=lsm,lsm=(merge_custom=(prefix=\"file:s3/\",suffix=\".lsm\",start_generation=2))";
     if ((ret = wt::create_table(session, rtbl, config)) != 0) {
         std::cout << wt::get_error_message(ret) << '\n';
         return ret;
     }
-    config = "type=lsm,key_format=SQ,value_format=Hu";
+    config = "key_format=SQ,value_format=Hu,type=lsm,lsm=(merge_custom=(prefix=\"file:s3/\",suffix=\".lsm\",start_generation=2))";
     if ((ret = wt::create_table(session, ctbl, config)) != 0) {
         std::cout << wt::get_error_message(ret) << '\n';
         return ret;

Rebuild: cd src && make.

Install s3fs:

$ sudo amazon-linux-extras install epel
$ sudo yum install s3fs-fuse

Create an S3 bucket at https://s3.console.aws.amazon.com/s3/home?region=ap-southeast-2

Store your AWS credentials:

$ echo ACCESS_KEY_ID:SECRET_ACCESS_KEY > ${HOME}/.passwd-s3fs
$ chmod 600 ${HOME}/.passwd-s3fs

Mount the bucket as a filesystem called s3 inside the data directory:

$ S3_REGION=ap-southeast-2
$ s3fs flitiger wt_test/s3 -o dbglevel=info -o endpoint=${S3_REGION} -o passwd_file=${HOME}/.passwd-s3fs -o url=https://s3-${S3_REGION}.amazonaws.com/

Run the workload as before: once enough data is inserted, merges will be triggered that create level 2 chunks (after around 60 chunks in each LSM tree).

About

A usage of the WiredTiger library that splits BSON documents and stores them fully indexed.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •