Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use Substrait's PrecisionTimestamp and PrecisionTimestampTz instead of deprecated Timestamp #11597

Merged
merged 5 commits into from
Aug 22, 2024

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Jul 22, 2024

Which issue does this PR close?

N/A
closes #12074

Rationale for this change

DF was using the Substrait type_variations on a Substrait Timestamp to indicate whether a timestamp is seconds/millis/micros/nanos, while that works within DF that's not interoperable with other systems. Substrait has since introduced a PrecisionTimestamp type which includes the precision as a first class option, so we should use that instead.

What changes are included in this PR?

  • Bump substrait to latest and fix breaks
  • Support consuming PrecisionTimestamp and PrecisionTimestampTz types in addition to the deprecate Timestamp type
  • Produce PrecisionTimestamp and PrecisionTimestampTz types

Are these changes tested?

Yes, with unit tests

Are there any user-facing changes?

r#type::Kind::IntervalYear(_) => {
Ok(DataType::Interval(IntervalUnit::YearMonth))
}
r#type::Kind::IntervalDay(_) => Ok(DataType::Interval(IntervalUnit::DayTime)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just cleanup - we don't check type variations for types where we don't use them

})) => {
// DF only supports millisecond precision, so we lose the micros here
ScalarValue::new_interval_dt(*days, (seconds * 1000) + (microseconds / 1000))
// DF only supports millisecond precision, so for any more granular type we lose precision
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes were needed as part of bumping substrait (substrait-io/substrait#665)

@@ -1415,6 +1425,7 @@ fn to_substrait_type(
kind: Some(r#type::Kind::IntervalDay(r#type::IntervalDay {
type_variation_reference: DEFAULT_TYPE_VARIATION_REF,
nullability,
precision: Some(3), // DayTime precision is always milliseconds
Copy link
Contributor Author

@Blizzara Blizzara Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required due to bumping substrait substrait-io/substrait#665

substrait = { version = "0.36.0", features = ["serde"] }
pbjson-types = "0.7"
prost = "0.13"
substrait = { version = "0.41", features = ["serde"] }
Copy link
Contributor Author

@Blizzara Blizzara Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a bump for precision timestamp types to have precision, and their value as i64 instead of u64

pbjson and prost need to be bumped to match substrait's deps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Blizzara

I also have a PR prepared to update a bunch of these dependencies as well queued up for the arrow release next week: #12032

join_rel::JoinType::Anti => Ok(JoinType::LeftAnti),
join_rel::JoinType::Semi => Ok(JoinType::LeftSemi),
join_rel::JoinType::LeftAnti => Ok(JoinType::LeftAnti),
join_rel::JoinType::LeftSemi => Ok(JoinType::LeftSemi),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed due to bumping substrait - I think this is just a compile-time break tho, the actual protobuf values stay the same

@Blizzara Blizzara marked this pull request as ready for review August 20, 2024 14:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Blizzara -- this change makes sense to me.

substrait = { version = "0.36.0", features = ["serde"] }
pbjson-types = "0.7"
prost = "0.13"
substrait = { version = "0.41", features = ["serde"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Blizzara

I also have a PR prepared to update a bunch of these dependencies as well queued up for the arrow release next week: #12032

@@ -38,10 +38,16 @@
/// The "system-preferred" variation (i.e., no variation).
pub const DEFAULT_TYPE_VARIATION_REF: u32 = 0;
pub const UNSIGNED_INTEGER_TYPE_VARIATION_REF: u32 = 1;

#[deprecated(since = "41.0.0", note = "Use `PrecisionTimestamp(Tz)` type instead")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SInce we have already released 41 https://crates.io/crates/datafusion/41.0.0 I think we should probably label this with the next version. For example:

Suggested change
#[deprecated(since = "41.0.0", note = "Use `PrecisionTimestamp(Tz)` type instead")]
#[deprecated(since = "42.0.0", note = "Use `PrecisionTimestamp(Tz)` type instead")]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah yeah, this PR was a long time in the making 😅 fixed in a05637b

@alamb alamb merged commit 89cb6a2 into apache:main Aug 22, 2024
26 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 22, 2024

Thanks again @Blizzara

@Blizzara Blizzara deleted the avo/precision-timestamp branch August 22, 2024 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants