Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPPRO-279] Add bloom_filter_agg and might_contain SparkSql function #79

Merged
merged 6 commits into from
Jan 11, 2023

Conversation

jinchengchenghh
Copy link
Collaborator

No description provided.

@jinchengchenghh
Copy link
Collaborator Author

{
    "extensions": [{
            "extensionFunction": {
                "name": "might_contain:vbin_i64"
            }
        }, {
            "extensionFunction": {
                "functionAnchor": 1,
                "name": "min:opt_bool"
            }
        }
    ],
    "relations": [{
            "root": {
                "input": {
                    "aggregate": {
                        "common": {
                            "direct": {}
                        },
                        "input": {
                            "project": {
                                "common": {
                                    "direct": {}
                                },
                                "input": {
                                    "project": {
                                        "common": {
                                            "direct": {}
                                        },
                                        "input": {
                                            "read": {
                                                "common": {
                                                    "direct": {}
                                                },
                                                "baseSchema": {
                                                    "names": ["value#337"],
                                                    "struct": {
                                                        "types": [{
                                                                "i64": {
                                                                    "nullability": "NULLABILITY_REQUIRED"
                                                                }
                                                            }
                                                        ]
                                                    }
                                                },
                                                "localFiles": {
                                                    "items": [{
                                                            "uriFile": "iterator:0"
                                                        }
                                                    ]
                                                }
                                            }
                                        },
                                        "expressions": [{
                                                "selection": {
                                                    "directReference": {
                                                        "structField": {}
                                                    }
                                                }
                                            }
                                        ]
                                    }
                                },
                                "expressions": [{
                                        "scalarFunction": {
                                            "outputType": {
                                                "bool": {
                                                    "nullability": "NULLABILITY_NULLABLE"
                                                }
                                            },
                                            "arguments": [{
                                                    "value": {
                                                        "literal": {
                                                            "binary
                                                        }
                                                    }
                                                }, {
                                                    "value": {
                                                        "selection": {
                                                            "directReference": {
                                                                "structField": {}
                                                            }
                                                        }
                                                    }
                                                }
                                            ]
                                        }
                                    }, {
                                        "scalarFunction": {
                                            "outputType": {
                                                "bool": {
                                                    "nullability": "NULLABILITY_NULLABLE"
                                                }
                                            },
                                            "arguments": [{
                                                    "value": {
                                                        "literal": {
                                                            "binary
                                                        }
                                                    }
                                                }, {
                                                    "value": {
                                                        "selection": {
                                                            "directReference": {
                                                                "structField": {}
                                                            }
                                                        }
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                ]
                            }
                        },
                        "groupings": [{}
                        ],
                        "measures": [{
                                "measure": {
                                    "functionReference": 1,
                                    "phase": "AGGREGATION_PHASE_INITIAL_TO_INTERMEDIATE",
                                    "outputType": {
                                        "bool": {
                                            "nullability": "NULLABILITY_NULLABLE"
                                        }
                                    },
                                    "arguments": [{
                                            "value": {
                                                "selection": {
                                                    "directReference": {
                                                        "structField": {}
                                                    }
                                                }
                                            }
                                        }
                                    ]
                                }
                            }, {
                                "measure": {
                                    "functionReference": 1,
                                    "phase": "AGGREGATION_PHASE_INITIAL_TO_INTERMEDIATE",
                                    "outputType": {
                                        "bool": {
                                            "nullability": "NULLABILITY_NULLABLE"
                                        }
                                    },
                                    "arguments": [{
                                            "value": {
                                                "selection": {
                                                    "directReference": {
                                                        "structField": {
                                                            "field": 1
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                },
                "names": ["min#375", "min#376"]
            }
        }
    ]
}

@jinchengchenghh jinchengchenghh changed the title Add bloom_filter_agg and might_contain SparkSql function [OPPRO-279] Add bloom_filter_agg and might_contain SparkSql function Nov 23, 2022
@jinchengchenghh jinchengchenghh force-pushed the bloomfilter branch 2 times, most recently from a02646f to d94f8d5 Compare January 6, 2023 10:49
Change bit_ size to fix TPCDS performance
@jinchengchenghh
Copy link
Collaborator Author

TPCDS 2T BloomFilter jenkins test performance

--conf spark.sql.optimizer.runtime.bloomFilter.enabled=true
--conf spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold=0

<style> </style>
query log/native_604_time.csv log/native_master_01_08_2023_time.csv difference percentage
q1 11.6931 11.9805 0.287 102.46%
q2 12.79585 17.30473 4.509 135.24%
q3 3.07347 4.227857 1.154 137.56%
q4 91.29562 92.81389 1.518 101.66%
q5 9.283287 8.809498 -0.474 94.90%
q6 3.682897 3.60386 -0.079 97.85%
q7 5.811735 5.967496 0.156 102.68%
q8 4.22666 4.656122 0.429 110.16%
q9 17.46926 43.03846 25.569 246.37%
q10 8.665179 7.654439 -1.011 88.34%
q11 56.02333 56.34804 0.325 100.58%
q12 1.443401 1.582153 0.139 109.61%
q13 7.145704 6.239176 -0.907 87.31%
q14a 52.2155 51.49362 -0.722 98.62%
q14b 49.51391 48.94648 -0.567 98.85%
q15 3.766696 2.79966 -0.967 74.33%
q16 22.50229 38.65976 16.157 171.80%
q17 4.552879 4.686279 0.133 102.93%
q18 5.980131 5.651807 -0.328 94.51%
q19 2.067475 2.049248 -0.018 99.12%
q20 2.479098 1.502049 -0.977 60.59%
q21 0.776707 0.770877 -0.006 99.25%
q22 11.53407 12.62338 1.089 109.44%
q23a 103.0481 103.3485 0.3 100.29%
q23b 122.2958 122.0835 -0.212 99.83%
q24a 104.7215 125.3081 20.587 119.66%
q24b 101.8974 123.8536 21.956 121.55%
q25 3.484254 3.571725 0.087 102.51%
q26 2.660438 2.957778 0.297 111.18%
q27 3.618242 3.566666 -0.052 98.57%
q28 16.97635 17.13041 0.154 100.91%
q29 7.466647 7.359792 -0.107 98.57%
q30 5.833605 6.377568 0.544 109.32%
q31 7.051928 7.237343 0.185 102.63%
q32 0.85187 0.848969 -0.003 99.66%
q33 2.07435 1.955878 -0.118 94.29%
q34 4.2341 5.128118 0.894 121.11%
q35 6.334798 6.144379 -0.19 96.99%
q36 8.090941 8.031527 -0.059 99.27%
q37 4.678717 4.546067 -0.133 97.16%
q38 20.44103 21.78072 1.34 106.55%
q39a 4.732259 4.855974 0.124 102.61%
q39b 4.245408 4.701053 0.456 110.73%
q40 4.622225 4.649959 0.028 100.60%
q41 0.411227 0.281694 -0.13 68.50%
q42 0.738463 0.712621 -0.026 96.50%
q43 3.767483 3.975577 0.208 105.52%
q44 7.38844 7.416458 0.028 100.38%
q45 2.76569 2.834724 0.069 102.50%
q46 4.275654 4.238839 -0.037 99.14%
q47 17.26599 17.21385 -0.052 99.70%
q48 4.233157 4.451542 0.218 105.16%
q49 18.83224 17.25101 -1.581 91.60%
q50 22.49405 22.38422 -0.11 99.51%
q51 12.9162 12.74751 -0.169 98.69%
q52 0.993159 1.092821 0.1 110.03%
q53 2.384135 2.446006 0.062 102.60%
q54 3.077926 3.197915 0.12 103.90%
q55 1.139224 0.968573 -0.171 85.02%
q56 1.70136 1.675955 -0.025 98.51%
q57 10.29662 10.32415 0.028 100.27%
q58 2.307751 2.225969 -0.082 96.46%
q59 9.633319 18.75885 9.126 194.73%
q60 2.489622 2.392135 -0.097 96.08%
q61 2.567612 2.450708 -0.117 95.45%
q62 8.4993 8.498089 -0.001 99.99%
q63 2.332137 2.211502 -0.121 94.83%
q64 31.59506 60.2293 28.634 190.63%
q65 21.19079 21.31042 0.12 100.56%
q66 6.351751 7.544742 1.193 118.78%
q67 506.8164 497.8867 -8.93 98.24%
q68 3.452104 3.278611 -0.173 94.97%
q69 6.397993 4.811454 -1.587 75.20%
q70 7.942763 8.349268 0.407 105.12%
q71 2.290849 2.258859 -0.032 98.60%
q72 31.00426 193.7056 162.701 624.77%
q73 2.336907 2.341827 0.005 100.21%
q74 31.73559 31.69619 -0.039 99.88%
q75 36.56467 31.43319 -5.131 85.97%
q76 10.64081 10.3965 -0.244 97.70%
q77 2.132694 2.086039 -0.047 97.81%
q78 48.98188 49.21256 0.231 100.47%
q79 4.462708 4.520709 0.058 101.30%
q80 13.99385 13.05643 -0.937 93.30%
q81 5.388963 5.378397 -0.011 99.80%
q82 8.010436 7.777885 -0.233 97.10%
q83 1.215775 1.196639 -0.019 98.43%
q84 3.013927 3.058335 0.044 101.47%
q85 7.258891 8.930836 1.672 123.03%
q86 3.102096 3.171238 0.069 102.23%
q87 20.61162 24.2414 3.63 117.61%
q88 54.45413 54.27913 -0.175 99.68%
q89 3.179722 3.198706 0.019 100.60%
q90 4.111231 4.063108 -0.048 98.83%
q91 2.393251 2.447663 0.054 102.27%
q92 1.271136 1.38742 0.116 109.15%
q93 28.54033 34.1736 5.633 119.74%
q94 16.36166 19.93356 3.572 121.83%
q95 69.61989 77.24926 7.629 110.96%
q96 7.96766 8.588116 0.62 107.79%
q97 23.20584 22.92041 -0.285 98.77%
q98 2.275588 2.281033 0.005 100.24%
q99 16.02148 16.05011 0.029 100.18%
total 2115.764 2413.043 297.279 114.05%

@jinchengchenghh
Copy link
Collaborator Author

The bloomfilter is different with vanilla spark, so if we offload bloom_filter_agg to velox, but fallback might_contain in ProjectExec node in case of FallbackOneRowRelation or other case, the execution will failed with Unexpected Bloom filter version number.
Next step I will implement native bloom filter same as vanilla spark, can fallback to vanilla spark successfully, if you face the exception, you can set the gluten config spark.gluten.sql.native.bloomFilter=false

@zhouyuan zhouyuan merged commit 524f857 into oap-project:main Jan 11, 2023
zhejiangxiaomai pushed a commit that referenced this pull request Jan 11, 2023
…79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit that referenced this pull request Jan 11, 2023
…79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
jinchengchenghh added a commit to jinchengchenghh/velox that referenced this pull request Jan 20, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit that referenced this pull request Jan 31, 2023
…79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit that referenced this pull request Feb 22, 2023
…79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Feb 27, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Mar 6, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Mar 27, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Mar 29, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Mar 29, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Apr 14, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Apr 17, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Apr 19, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Apr 20, 2023
…ap-project#79)

* add sparksql function bloom_filter_agg and might_contain

Change bit_ size to fix TPCDS performance

* change to statefil function

* optimize MightContain

* change back to spark value

* fix merge bloomfilter

* remove comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants