The Dynamic Cuckoo Filter (DCF) is an efficient approximate membership test data structure. Different from the classic Bloom filter and its variants, DCF is especially designed for highly dynamic dataset and support extending and reducing its capacity. The advantages of DCF are as follows
- The DCF design is the first to achieve both reliable item deletion and flexibly extending/reducing for approximate set representation and membership testing
- DCF outperforms the state-of-the-art DBF design in terms of the capability of reliable item deletion
- A DCF reduces the required memory space of the DBF by 75% as well as improving the speeds of insert/query/delete operation by 30% to 80%.
A DCF leverages CF as building block and consists of a number of s linked homogeneous CFs. The DCF leverages fingerprints to represent items. Each fingerprints owns two candidate bucket addresses generated by partial-key cuckoo hashing. A DCF achieves dynamic capacity by appending new building blocks and removing empty building blocks generated by compact operation.
Generate DCF according to the expected maximum item number, expected false positive rate.
DynamicCuckooFilter* dcf = new DynamicCuckooFilter(config.item_num, config.exp_FPR);
The expected building block number of DCF is set to 6 by default and can also be modified as follow
DynamicCuckooFilter* dcf = new DynamicCuckooFilter(config.item_num, config.exp_FPR, config.exp_BBN);
Four operations of DCF: Insert, Query, Delete and Compact
dcf->insertItem(item) // insert item ,item is a char* format
dcf->queryItem(item) //query item
dcf->deleteItem(item) //delete item
dcf->compact() //compact DCF
We implement DCF in a Linux (Ubuntu 14.04.3 LTS) with an Intel(R) Core(TM) i5-2430M CPU @2.4GHz and OpenSSL library environment.
Install OpenSSL (Please refer to https://www.openssl.org to learn more).
sudo apt-get install openssl
sudo apt-get install libssl-dev
Build and run the example
cd src/
make test
./test
Configurations including false pisitive, item number and dataset path can be costomized in "configuration/config.txt".
false positive = 0.02
item number = 1000000
input file path = input/input2.txt
Results are shown in "output/results.txt", including false positive, fingerprint size, building block number, operation time consumed and etc. In the following is the comparison of DCF and DBF when dealing with 1,000,000 items (including insert/query/delete/compact operation).
Metrics:
item_num: total number inserted/queried/deleted
exp_FPR: the expected false positive rate
actual_FPR: the false positive rate that we measured
actual_BBN: the building block number that we observed
F_size: fingerprint size
space_cost: space overhead of data structure
I_time: insert time
Q_time: query time
D_time: delete time
C_rate: compact rate
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// item_num exp_FPR actual_FPR actual_BBN F_size(bits) space_cost(MB) I_time(s) Q_time(s) D_time(s) C_rate ///
/// 1000000 0.0005 0.000502 5 16 2.5 1.08 1.05 1.36 1 ///
/// 1000000 0.0005 0.000505 7 0 10.8754 1.69 1.92 3.35 0 ///
/// 4X 1.5X 1.8X 2.4X ///
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
After transform operation time to speed, the DCF improving the speeds of insert/query/delete by 30% to 80% and 4X space efficiency.
DCF is developed in Big Data Technology and System Lab, Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China by Liangyi Liao ([email protected]), Hanhua Chen ([email protected]), Hai Jin ([email protected])
Copyright (C) 2017, STCS & CGCL and Huazhong University of Science and Technology.
If you want to know more detailed information, please refer to this paper:
Hanhua Chen, Liangyi Liao, Hai Jin, Jie Wu, "The Dynamic Cuckoo Filter," Proceedings of the 25th IEEE International Conference on Network Protocols (ICNP 2017), Toronto, Canada, Oct. 10-13, 2017.