Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal Extract/Date Part Kernel #5266

Closed
tustvold opened this issue Jan 1, 2024 · 2 comments · Fixed by #5319
Closed

Temporal Extract/Date Part Kernel #5266

tustvold opened this issue Jan 1, 2024 · 2 comments · Fixed by #5319
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Jan 1, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently we provide a mix of kernels for extracting portions of temporal types, https://docs.rs/arrow-arith/latest/arrow_arith/temporal/index.html

There are a couple of limitations with this implementation:

  • The use of generics makes it non-trivial to support types such as Time32/Time64 (Support for extracting hours/minutes/seconds/etc. from Time32/Time64 type in temporal kernels #5261) or intervals (#TBD)
  • The way the kernels are implemented results in redundant code generation for impossible combinations, e.g. a Date32Array with a DataType of Timestamp
  • The use of as_datetime and similar to always proxy via chrono is not only very inefficient, but also hard to follow
  • They make use of builders instead of the faster (and more concise) PrimitiveArray::try_unary methods
  • The code in general is quite hard to follow

Describe the solution you'd like

I would like to take a similar approach to used in #4465, where we have a single method that accepts an operation and array, and then dispatches the operation to the specialized implementation for that particular array type. Aside from simplifying the implementation, we could also expose this publicly to facilitate implementation of the SQL extract/date_part function - https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT.

In particular I would like something akin to

pub enum DatePart {
    Century,
    Year,
    ...
    Minute,
    ...
    DayOfYear,
    DayOfWeek,
}

pub fn date_part(array: &dyn Array, part: DatePart) -> Result<Int32Array> {
    match array.data_type() {
        ...
    }
}

The existing kernels would be deprecated and updated to act as simple shims to this dynamic dispatch logic, as has been done with the previous datum migrations.

Describe alternatives you've considered

Additional context

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Jan 1, 2024
@tustvold tustvold changed the title Temporal Extract Kernel Temporal Extract/Date Part Kernel Jan 1, 2024
@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 2, 2024

This looks quite good, and would've made #5262 a bit easier to implement for sure (and for intervals in the future) 👍

I wouldn't mind trying to take a shot at implementing this if it's not already on your todo list

@tustvold
Copy link
Contributor Author

tustvold commented Mar 1, 2024

label_issue.py automatically added labels {'arrow'} from #5319

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants