Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the support of SCALING for bloom filter #1721

Merged
merged 14 commits into from
Sep 12, 2023

Conversation

zncleon
Copy link
Contributor

@zncleon zncleon commented Aug 31, 2023

This PR supports the "SCALING" of bloom filter. When the bloom filter is full, it will add a new bloom filter into bloom chain if is "SCALING",otherwise it will return an error.

@@ -245,4 +245,9 @@ class BloomChainMetadata : public Metadata {
///
/// @return the total capacity value
uint32_t GetCapacity() const;

/// Check the bloom chain is scaling or not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's needless to comment on simple methods such as IsScaling or GetCapacity - they have self-explanatory names. BTW, the implementation of IsScaling is so short/simple that you can write it in the header file.

@PragmaTwice PragmaTwice changed the title Add the support of scalable bloom filter Add the support of SCALING for bloom filter Sep 1, 2023
return rocksdb::Status::Aborted("filter is full");
if (metadata.size + 1 > metadata.GetCapacity()) {
if (metadata.IsScaling()) {
s = createBloomFilter(ns_key, &metadata);
Copy link
Member

@PragmaTwice PragmaTwice Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the status seems not checked

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it can be done like this. Now we have holding a "lock" here. And createBloomFilter will directly call Write finally

I propose a createBloomFilterInBatch or other, the interface would just put the new bloomFilter in batch. The previous create can make use of this, and here we can just put it in batch first?


rocksdb::Status BloomChain::createBloomFilter(const Slice &ns_key, BloomChainMetadata *metadata) {
uint32_t bloom_filter_bytes = BlockSplitBloomFilter::OptimalNumOfBytes(
static_cast<uint32_t>(metadata->base_capacity * pow(metadata->expansion, metadata->n_filters)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not strong related to this patch, should we enforce a "hardlimit" to avoid too large BloomFilter?

cc @PragmaTwice @git-hulk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Redis have some limits here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RedisBloom has a maximum Cuckoo Filter n_filters. And don't have limit

I think our bitmap has some limits. And BlockSplitBloomFilter limit 128MB, perhaps if filter is too large, we need to reject and return to user?

Copy link
Member

@git-hulk git-hulk Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128MiB is also too large for one key, maybe we can improve this by separating them into multi subkeys in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I talked with @zncleon . Split into subkeys doesn't work like bitmap, since bloom filter would tent to random access all bits in it's space. Maybe a maximum size would help?

@zncleon
Copy link
Contributor Author

zncleon commented Sep 9, 2023

If error and return after CreateBloomfilter, it will left zombie bloomfilter. So I Update the function interfaces about CreateBloomfilterInBatch and BloomAdd. Using only one write will also get high performance.

cc @mapleFU

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM, thanks!

auto batch = storage_->GetWriteBatchBase();
batch->Put(bf_key, block_split_bloom_filter.GetData());
return storage_->Write(storage_->DefaultWriteOptions(), batch->GetWriteBatch());
*bf_data = block_split_bloom_filter.GetData();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is ok, but what about adding a MoveData to avoid an around of copying?

Copy link
Member

@PragmaTwice PragmaTwice Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a glance, I think we just add a string&& GetData() &&.

src/types/redis_bloom_chain.cc Outdated Show resolved Hide resolved
@mapleFU
Copy link
Member

mapleFU commented Sep 11, 2023

Oh seems Twice has refactor the storage, so you mind need to solve the conflicts.

auto batch = storage_->GetWriteBatchBase();
batch->Put(bf_key, block_split_bloom_filter.GetData());
return storage_->Write(storage_->DefaultWriteOptions(), batch->GetWriteBatch());
*bf_data = block_split_bloom_filter.GetData();
Copy link
Member

@PragmaTwice PragmaTwice Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*bf_data = block_split_bloom_filter.GetData();
*bf_data = std::move(block_split_bloom_filter).GetData();

And add a overload to GetData:

std::string&& GetData() && { return data_; }

Copy link
Member

@PragmaTwice PragmaTwice Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mapleFU that seems better than a MoveData method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this also LGTM

mapleFU
mapleFU previously approved these changes Sep 11, 2023
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

rocksdb::Status bloomAdd(const Slice &bf_key, const std::string &item);
void createBloomFilterInBatch(const Slice &ns_key, BloomChainMetadata *metadata,
ObserverOrUniquePtr<rocksdb::WriteBatchBase> &batch, std::string *bf_data);
static void bloomAdd(const std::string &item, std::string *bf_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: when using pointer as an argument, it looks like an output. Maybe we can add a comment to mark this like:

/// bf_data: [in/out] The content string of bloomfilter.

const std::string& GetData() { return data_; }
const std::string& GetData() const& { return data_; }

std::string GetData() const&& { return data_; }
Copy link
Member

@mapleFU mapleFU Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm would this need a const? You can take a look at how StatusOr does this:

ValueType&& GetValue() && {

const std::string& GetData() { return data_; }
const std::string& GetData() const& { return data_; }

std::string GetData() const&& { return data_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string GetData() const&& { return data_; }
std::string&& GetData() && { return std::move(data_); }

@PragmaTwice
Copy link
Member

Thanks for your contribution!

@PragmaTwice PragmaTwice merged commit 11a0140 into apache:unstable Sep 12, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants