-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the support of SCALING for bloom filter #1721
Conversation
src/storage/redis_metadata.h
Outdated
@@ -245,4 +245,9 @@ class BloomChainMetadata : public Metadata { | |||
/// | |||
/// @return the total capacity value | |||
uint32_t GetCapacity() const; | |||
|
|||
/// Check the bloom chain is scaling or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's needless to comment on simple methods such as IsScaling
or GetCapacity
- they have self-explanatory names. BTW, the implementation of IsScaling
is so short/simple that you can write it in the header file.
src/types/redis_bloom_chain.cc
Outdated
return rocksdb::Status::Aborted("filter is full"); | ||
if (metadata.size + 1 > metadata.GetCapacity()) { | ||
if (metadata.IsScaling()) { | ||
s = createBloomFilter(ns_key, &metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the status seems not checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it can be done like this. Now we have holding a "lock" here. And createBloomFilter
will directly call Write
finally
I propose a createBloomFilterInBatch
or other, the interface would just put the new bloomFilter in batch. The previous create can make use of this, and here we can just put it in batch first?
|
||
rocksdb::Status BloomChain::createBloomFilter(const Slice &ns_key, BloomChainMetadata *metadata) { | ||
uint32_t bloom_filter_bytes = BlockSplitBloomFilter::OptimalNumOfBytes( | ||
static_cast<uint32_t>(metadata->base_capacity * pow(metadata->expansion, metadata->n_filters)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not strong related to this patch, should we enforce a "hardlimit" to avoid too large BloomFilter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Redis have some limits here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RedisBloom has a maximum Cuckoo Filter n_filters
. And don't have limit
I think our bitmap has some limits. And BlockSplitBloomFilter
limit 128MB, perhaps if filter is too large, we need to reject and return to user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128MiB is also too large for one key, maybe we can improve this by separating them into multi subkeys in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I talked with @zncleon . Split into subkeys doesn't work like bitmap, since bloom filter would tent to random access all bits in it's space. Maybe a maximum size would help?
a4d7274
to
4210137
Compare
If error and return after cc @mapleFU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM, thanks!
src/types/redis_bloom_chain.cc
Outdated
auto batch = storage_->GetWriteBatchBase(); | ||
batch->Put(bf_key, block_split_bloom_filter.GetData()); | ||
return storage_->Write(storage_->DefaultWriteOptions(), batch->GetWriteBatch()); | ||
*bf_data = block_split_bloom_filter.GetData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface is ok, but what about adding a MoveData
to avoid an around of copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a glance, I think we just add a string&& GetData() &&
.
Oh seems Twice has refactor the storage, so you mind need to solve the conflicts. |
src/types/redis_bloom_chain.cc
Outdated
auto batch = storage_->GetWriteBatchBase(); | ||
batch->Put(bf_key, block_split_bloom_filter.GetData()); | ||
return storage_->Write(storage_->DefaultWriteOptions(), batch->GetWriteBatch()); | ||
*bf_data = block_split_bloom_filter.GetData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*bf_data = block_split_bloom_filter.GetData(); | |
*bf_data = std::move(block_split_bloom_filter).GetData(); |
And add a overload to GetData
:
std::string&& GetData() && { return data_; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mapleFU that seems better than a MoveData
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this also LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
rocksdb::Status bloomAdd(const Slice &bf_key, const std::string &item); | ||
void createBloomFilterInBatch(const Slice &ns_key, BloomChainMetadata *metadata, | ||
ObserverOrUniquePtr<rocksdb::WriteBatchBase> &batch, std::string *bf_data); | ||
static void bloomAdd(const std::string &item, std::string *bf_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: when using pointer as an argument, it looks like an output. Maybe we can add a comment to mark this like:
/// bf_data: [in/out] The content string of bloomfilter.
src/types/bloom_filter.h
Outdated
const std::string& GetData() { return data_; } | ||
const std::string& GetData() const& { return data_; } | ||
|
||
std::string GetData() const&& { return data_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emmm would this need a const? You can take a look at how StatusOr
does this:
Line 276 in 2ecaef2
ValueType&& GetValue() && { |
src/types/bloom_filter.h
Outdated
const std::string& GetData() { return data_; } | ||
const std::string& GetData() const& { return data_; } | ||
|
||
std::string GetData() const&& { return data_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string GetData() const&& { return data_; } | |
std::string&& GetData() && { return std::move(data_); } |
Thanks for your contribution! |
This PR supports the "SCALING" of bloom filter. When the bloom filter is full, it will add a new bloom filter into bloom chain if is "SCALING",otherwise it will return an error.