Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to DataFusion 15.0.0 #949

Merged
merged 10 commits into from
Jan 5, 2023
Merged

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Dec 1, 2022

DataFusion 15.0.0 will be released in the next few days. This PR is in preparation for upgrading.

@@ -591,7 +594,7 @@ impl PyExpr {
}

#[pyo3(name = "getDecimal128Value")]
pub fn decimal_128_value(&mut self) -> PyResult<(Option<i128>, u8, u8)> {
pub fn decimal_128_value(&mut self) -> PyResult<(Option<i128>, u8, i8)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal scale is now a signed value

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2022

Codecov Report

Merging #949 (eb4037c) into main (747d8ab) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #949      +/-   ##
==========================================
+ Coverage   77.67%   77.80%   +0.13%     
==========================================
  Files          75       75              
  Lines        4215     4218       +3     
  Branches      765      766       +1     
==========================================
+ Hits         3274     3282       +8     
+ Misses        775      766       -9     
- Partials      166      170       +4     
Impacted Files Coverage Δ
dask_sql/physical/rel/logical/subquery_alias.py 76.92% <100.00%> (+6.92%) ⬆️
dask_sql/_version.py 35.31% <0.00%> (+1.41%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines +129 to +130
SubqueryAlias: __sq_1
Projection: AVG(test.col_int32) AS __value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projections no longer have aliases, but use the generic SubqueryAlias operator instead

@andygrove andygrove marked this pull request as ready for review December 5, 2022 18:42
@ayushdg
Copy link
Collaborator

ayushdg commented Dec 6, 2022

rerun tests

Comment on lines -19 to +36
return dc

cc = dc.column_container

alias = rel.subquery_alias().getAlias()

return DataContainer(
dc.df,
cc.rename(
{
col: renamed_col
for col, renamed_col in zip(
cc.columns,
(f"{alias}.{col.split('.')[-1]}" for col in cc.columns),
)
}
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this DataFusion bump, we now optimize otherwise identical projections with different aliases into a single projection followed by the equivalent SubqueryAlias, e.g. turning

Projection: customer_total_return.ctr_store_sk, customer_total_return.ctr_total_return, alias=ctr2
  Projection: store_returns.sr_store_sk AS ctr_store_sk, SUM(store_returns.sr_fee) AS ctr_total_return, alias=customer_total_return

into

SubqueryAlias: ctr2
  SubqueryAlias: customer_total_return
    Projection: store_returns.sr_store_sk AS ctr_store_sk, SUM(store_returns.sr_fee) AS ctr_total_return

This exposed the fact that our SubqueryAlias plugin was a no-op that didn't actually do the required aliasing. These modifications should unblock failures stemming from this optimization.

Copy link
Collaborator

@jdye64 jdye64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking @charlesbluca comments about SubqueryAlias into account these changes LGTM

Copy link
Collaborator

@ayushdg ayushdg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since this pr was opened, but the changes lgtm!
I'm holding off on merging to run some benchmarks internally but should be good to merge in a day or so

@ayushdg ayushdg merged commit 767acf2 into dask-contrib:main Jan 5, 2023
@andygrove andygrove deleted the datafusion-15.0.0 branch January 5, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants