Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support complex filter before merge procedure #256

Closed
ShiKaiWi opened this issue Sep 15, 2022 · 5 comments · Fixed by #326
Closed

Support complex filter before merge procedure #256

ShiKaiWi opened this issue Sep 15, 2022 · 5 comments · Fixed by #326
Assignees
Labels
A-analytic-engine Area: Analytic Engine feature New feature or request

Comments

@ShiKaiWi
Copy link
Member

ShiKaiWi commented Sep 15, 2022

Describe This Problem

A filter procedure according to the query predicates will be applied to the record batch stream from sst before feeding the batches to the merge iterator. However, the filter only supports a very simple form -- anded binary expression, so it doesn't work if the query predicate is complex, e.g. where (hostname = '127.0.0.1' or hostname = '192.168.0.2') and timestamp between 'xxxx' and 'xxxx'.

Proposal

The crucial point here is how to make the filter procedure can support complex predicate expressions, and basically there are two approaches to this target:

  • Utilize datafusion;
  • Implement the filter logic manually;

And I vote for the first approach, but we have to figure out how to utilize datafusion to implement the filter logic.

Additional Context

The filter procedure is applied here:
https://github.com/CeresDB/ceresdb/blob/43a84ba3c2ddcee69906e70322060b6dc4e91ddc/analytic_engine/src/row_iter/record_batch_stream.rs#L137
No response

@ShiKaiWi ShiKaiWi added feature New feature or request A-analytic-engine Area: Analytic Engine labels Sep 15, 2022
@jiacai2050
Copy link
Contributor

jiacai2050 commented Sep 22, 2022

TSBS is added to CI, we can use it to compare performance before/after fix this issue

@ygf11
Copy link
Contributor

ygf11 commented Oct 5, 2022

To utilize datafusion, we can do:

  • Create PhysicalExpr from LogicalExpr via create_physical_expr.
  • Implement filter logic like FilterExecStream do in datafusion.

create_physical_expr: https://github.com/apache/arrow-datafusion/blob/45fc415daa7028559ef3477e53a184a114149f9e/datafusion/physical-expr/src/planner.rs#L42

FilterExecStream: https://github.com/apache/arrow-datafusion/blob/45fc415daa7028559ef3477e53a184a114149f9e/datafusion/core/src/physical_plan/filter.rs#L180

Maybe I can help do this task :D.

@ShiKaiWi
Copy link
Member Author

ShiKaiWi commented Oct 6, 2022

It will be appreciated if you volunteer to help.

@ShiKaiWi
Copy link
Member Author

ShiKaiWi commented Oct 8, 2022

@ygf11 I have updated the code location about the filtering procedure, and I hope it will help:
https://github.com/CeresDB/ceresdb/blob/43a84ba3c2ddcee69906e70322060b6dc4e91ddc/analytic_engine/src/row_iter/record_batch_stream.rs#L137

@ygf11
Copy link
Contributor

ygf11 commented Oct 8, 2022

I have updated the code location about the filtering procedure, and I hopes it will help.

Thanks for reminding, it helps a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-analytic-engine Area: Analytic Engine feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants