-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to DataFusion 15.0.0 #949
Conversation
@@ -591,7 +594,7 @@ impl PyExpr { | |||
} | |||
|
|||
#[pyo3(name = "getDecimal128Value")] | |||
pub fn decimal_128_value(&mut self) -> PyResult<(Option<i128>, u8, u8)> { | |||
pub fn decimal_128_value(&mut self) -> PyResult<(Option<i128>, u8, i8)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimal scale is now a signed value
Codecov Report
@@ Coverage Diff @@
## main #949 +/- ##
==========================================
+ Coverage 77.67% 77.80% +0.13%
==========================================
Files 75 75
Lines 4215 4218 +3
Branches 765 766 +1
==========================================
+ Hits 3274 3282 +8
+ Misses 775 766 -9
- Partials 166 170 +4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
SubqueryAlias: __sq_1 | ||
Projection: AVG(test.col_int32) AS __value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Projections no longer have aliases, but use the generic SubqueryAlias
operator instead
rerun tests |
return dc | ||
|
||
cc = dc.column_container | ||
|
||
alias = rel.subquery_alias().getAlias() | ||
|
||
return DataContainer( | ||
dc.df, | ||
cc.rename( | ||
{ | ||
col: renamed_col | ||
for col, renamed_col in zip( | ||
cc.columns, | ||
(f"{alias}.{col.split('.')[-1]}" for col in cc.columns), | ||
) | ||
} | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this DataFusion bump, we now optimize otherwise identical projections with different aliases into a single projection followed by the equivalent SubqueryAlias
, e.g. turning
Projection: customer_total_return.ctr_store_sk, customer_total_return.ctr_total_return, alias=ctr2
Projection: store_returns.sr_store_sk AS ctr_store_sk, SUM(store_returns.sr_fee) AS ctr_total_return, alias=customer_total_return
into
SubqueryAlias: ctr2
SubqueryAlias: customer_total_return
Projection: store_returns.sr_store_sk AS ctr_store_sk, SUM(store_returns.sr_fee) AS ctr_total_return
This exposed the fact that our SubqueryAlias
plugin was a no-op that didn't actually do the required aliasing. These modifications should unblock failures stemming from this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking @charlesbluca comments about SubqueryAlias into account these changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since this pr was opened, but the changes lgtm!
I'm holding off on merging to run some benchmarks internally but should be good to merge in a day or so
DataFusion 15.0.0 will be released in the next few days. This PR is in preparation for upgrading.