Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Complete Initial StringView in DataFusion #11752

Open
15 of 21 tasks
alamb opened this issue Jul 31, 2024 · 6 comments
Open
15 of 21 tasks

[Epic] Complete Initial StringView in DataFusion #11752

alamb opened this issue Jul 31, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 31, 2024

Is your feature request related to a problem or challenge?

This ticket is a follow on to #10918 where we implemented enough initial support for StringView / BinaryView that we can show some pretty sweet ClickBench results

Describe the solution you'd like

This epic tracks remaining work to complete the "initial" work which I would like to define as "enable using StringView when reading Strings from Parquet by default"

I am sure there will be additional work / support to add StringView to various other features of DataFusion that we can maybe track with another follow on ticket

Required for enabling StringView by default:

Could work around but really should be fixed upstream

Additional "Nice to have" Features

@alamb alamb added the enhancement New feature or request label Jul 31, 2024
@alamb
Copy link
Contributor Author

alamb commented Aug 22, 2024

An update here is that @XiangpengHao has a PR with various changes in #11862

We still need to check that PR and figure out what else is in that PR is needed to be enabled "for real" (with tests, etc)

@alamb
Copy link
Contributor Author

alamb commented Aug 22, 2024

My ideal resolution here is that we end up in the state where the only change we need to enable string view by default is switch the config setting. I will do some more ticket triage later today to outline other items I know of

@2010YOUY01
Copy link
Contributor

Do we have tickets for regexp binary operators? (like ~, !~...)
https://datafusion.apache.org/user-guide/sql/operators.html#op-re-match

I noticed stringview is not supported on them yet and they have separate implementation than regexp functions

Details

/*DML*/CREATE TABLE t0(v0 DOUBLE, v1 DOUBLE, v2 BOOLEAN, v3 BOOLEAN, v4 BOOLEAN, v5 STRING);
/*DML*/INSERT INTO t0(v1, v5, v2) VALUES (0.7183242196192607, 'Tn', true);
/*DML*/CREATE TABLE t0_stringview AS SELECT v0, v1, v2, v3, v4, arrow_cast(v5, 'Utf8View') as v5 FROM t0;

> select v5 ~ 'foo' from t0_stringview;
Internal error: Data type Utf8View not supported for binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

> select regexp_match(v5, 'foo') from t0_stringview;
+--------------------------------------------+
| regexp_match(t0_stringview.v5,Utf8("foo")) |
+--------------------------------------------+
|                                            |
+--------------------------------------------+
1 row(s) fetched.
Elapsed 0.034 seconds.

@alamb
Copy link
Contributor Author

alamb commented Aug 24, 2024

Do we have tickets for regexp binary operators? (like , !...)

Not that I know of -- it would be great to add them

@alamb
Copy link
Contributor Author

alamb commented Aug 26, 2024

Do we have tickets for regexp binary operators? (like , !...)

Filed #12180

@alamb
Copy link
Contributor Author

alamb commented Oct 5, 2024

I am going to try and polish up PR to enable string view by default PR (with the arrow upgrade and various recent improvements) and see how close we are #12092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants