Add bloom_filter_agg and might_contain SparkSql function #3694

jinchengchenghh · 2023-01-12T01:58:57Z

Description

Here is the performance test in project gluten
oap-project#79

Summary: This function is used in Spark Runtime Filters: apache/spark#35789 https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/edit#heading=h.4v65wq7vzy4q BloomFilter implementation in Velox is different from Spark, hence, serialized BloomFilter is different. Velox has memory limit for contiguous memory buffer, hence BloomFilter capacity is less than in Spark when numBits is large. See #4713 (comment) Spark allows for changing the defaults while Velox does not. See also #3342 Fixes #3694 Pull Request resolved: #4028 Reviewed By: Yuhta Differential Revision: D46352733 Pulled By: mbasmanova fbshipit-source-id: 1c8a0b489a736e627ba2c0869688fc0cf46279bb

jinchengchenghh added the enhancement New feature or request label Jan 12, 2023

This was referenced Feb 15, 2023

Add might_contain SparkSql function #4029

Closed

Add bloom_filter_agg Spark aggregate function #4028

Closed

facebook-github-bot closed this as completed in 12d5c87 Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bloom_filter_agg and might_contain SparkSql function #3694

Add bloom_filter_agg and might_contain SparkSql function #3694

jinchengchenghh commented Jan 12, 2023

Add bloom_filter_agg and might_contain SparkSql function #3694

Add bloom_filter_agg and might_contain SparkSql function #3694

Comments

jinchengchenghh commented Jan 12, 2023

Description