Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date transformers casting dtypes if passed in as an input #142

Closed
Decima2014 opened this issue Nov 24, 2023 · 5 comments · Fixed by #148
Closed

Date transformers casting dtypes if passed in as an input #142

Decima2014 opened this issue Nov 24, 2023 · 5 comments · Fixed by #148
Labels
feature New feature or request

Comments

@Decima2014
Copy link

We have had some issues in the past with some tubular date transformers requiring conflicting specific dtypes to be accepted by the transformer, ie: datetime.object, datetime.date, datetime[n64].

The ExtractTimeInfo and DateDifferenceTransformer for example requiring different dtypes

I believe having an option to automatically cast the column to the required dtype as opposed to directly failing would be preferable

@Decima2014 Decima2014 added the feature New feature or request label Nov 24, 2023
@Decima2014 Decima2014 changed the title Tubular dates casting dtypes if passed in as an input Date transformers casting dtypes if passed in as an input Nov 24, 2023
@Decima2014
Copy link
Author

You sometimes dont want your datetime columns to be casted to a new type even if that type is what the transformer requires, so your dates transformers could use a temporary copy of the column for the transformation

@TommyMatthews
Copy link
Contributor

Will look at this this sprint

@TommyMatthews
Copy link
Contributor

DateDiffLeapYearTransformer also raises an error incorrectly when passed a datetime object

@TommyMatthews
Copy link
Contributor

I've had a look at this and it looks like the conflicting dtypes can be quickly fixed by making a BaseDateTimeTransformer and adding a standardised datetime input check that all of the transformers use. This should make it much easier to put columns in the right format beforehand with a ToDateTimeTransformer.

With regards to casting, I think this would be a lot more faff to do, create scope for a lot more hidden errors and make a lot of testing overhead, especially given that the copy datetime columns can be created using a to datetime transformer one step before in the pipeline.

@davidhopkinson26 @Decima2014

@TommyMatthews
Copy link
Contributor

I've characterised the problem slightly more. Considering the two main datetime datatypes, datetime.date and datetime.datetime:

  • DateTimeInfoExtractor: only works on datetime.datetime due to how it's written - needs the .dt accessor only available to datetime.datetime objects. I think this unavoidably needs temporary casting inside transform for full consistency across dates.py

  • DateDifferenceTransformer: works with either datetime.datetime or datetime.time, but can't handle the combination of the two types across the two columns. Think this should be a quick fix.

  • DateDifferenceLeapYearTransformer: no issues once I fix the input checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants