Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bloom_filter_agg and might_contain SparkSql function #3694

Closed
jinchengchenghh opened this issue Jan 12, 2023 · 0 comments
Closed

Add bloom_filter_agg and might_contain SparkSql function #3694

jinchengchenghh opened this issue Jan 12, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jinchengchenghh
Copy link
Contributor

Description

Here is the performance test in project gluten
oap-project#79

@jinchengchenghh jinchengchenghh added the enhancement New feature or request label Jan 12, 2023
facebook-github-bot pushed a commit that referenced this issue Jun 7, 2023
Summary:
This function is used in Spark Runtime Filters: apache/spark#35789

https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/edit#heading=h.4v65wq7vzy4q

BloomFilter implementation in Velox is different from Spark, hence, serialized BloomFilter is different.

Velox has memory limit for contiguous memory buffer, hence BloomFilter capacity is less than in Spark when numBits is large. See #4713 (comment)

Spark allows for changing the defaults while Velox does not.

See also #3342

Fixes #3694

Pull Request resolved: #4028

Reviewed By: Yuhta

Differential Revision: D46352733

Pulled By: mbasmanova

fbshipit-source-id: 1c8a0b489a736e627ba2c0869688fc0cf46279bb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant