Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: switch to using datafusion to parse functions #1793

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

westonpace
Copy link
Contributor

@westonpace westonpace commented Jan 8, 2024

My original plan was to use DF for parsing expressions (not just functions). However, I ran into some issues:

  • Support timestamp literals with precision specifier apache/datafusion#7249
  • Datafusion does not support nested nested references (e.g. it supports x.y but not x.y.z)
  • Array literals always appear as List and not FixedSizeList
    • Datafusion's array literals initially show up as "MakeArray" calls and so they are missed by our type coercion. They turn into literals later during expression simplification.
  • Functions from duckdb (e.g. is_valid) are not recognized by datafusion
    • Long term this kind of engine-to-engine conversion is best handled by substrait
    • In the short term we could maybe register these functions as DF UDFs

So this PR only uses DF to parse the functions. In addition, we keep the legacy path to handle the is_valid function for DuckDb.

Closes #1115
Closes #1784

Copy link

github-actions bot commented Jan 8, 2024

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@westonpace westonpace changed the title feat: Switch to using datafusion to parse functions feat: switch to using datafusion to parse functions Jan 8, 2024
@westonpace
Copy link
Contributor Author

I've updated #1129 with the incompatibilities I observed this time.

@wjones127
Copy link
Contributor

Datafusion does not support nested nested references (e.g. it supports x.y but not x.y.z)

What do you mean? can't you just nest GetIndexedField calls?

@westonpace
Copy link
Contributor Author

What do you mean? can't you just nest GetIndexedField calls?

Expr supports it. SqlToExpr does not.

@wjones127
Copy link
Contributor

Does this fix #1115?

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me :)

@westonpace
Copy link
Contributor Author

Does this fix #1115?

Looks like it. I've added a regression test for array_contains

@wjones127
Copy link
Contributor

(Sorry just merged a PR that added some conflicts. Should be easy to resolve though.)

@westonpace westonpace merged commit 9b699c0 into lancedb:main Jan 8, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow datafusion to handle function parsing in expressions Support array_contains in filter
2 participants