-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine UnsignedIntegerCoder
and CFMaskCoder
#9274
Merged
kmuehlbauer
merged 13 commits into
pydata:main
from
djhoese:refactor-unsigned-masked-cf
Aug 20, 2024
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
e1baa93
Fix small typo in docstring
djhoese cae77aa
Combine CF Unsigned and Mask handling
djhoese a37c5d0
Replace UnsignedIntegerCode tests with CFMaskCoder usage
djhoese 39b65c5
Fix dtype type annotation
djhoese b72ded9
Fix when unsigned serialization warning is expected in tests
djhoese e6e71e2
Small refactor of CFMaskCoder decoding
djhoese c996918
Add CF encoder tests for _Unsigned=false cases
djhoese a8eb418
Merge branch 'main' into refactor-unsigned-masked-cf
djhoese bdc122e
Remove UnsignedIntegerCoder from api docs
djhoese dea730f
Merge branch 'main' into refactor-unsigned-masked-cf
kmuehlbauer aecc9fa
Merge branch 'main' into refactor-unsigned-masked-cf
dcherian 01a4742
Merge branch 'main' into refactor-unsigned-masked-cf
dcherian 5aae952
Merge branch 'main' into refactor-unsigned-masked-cf
kmuehlbauer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block didn't use @kmuehlbauer's trick for handling overflow by using
.view
. I'm wondering if I need to use that here, but no tests hit it. I've never actually seen_Unsigned == "false"
in the wild.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change that here then I think I might be able to shrink this function and do things as "old dtype" and "new dtype" rather than signed versus unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was requested at some point in time for a specific use case. I'll try to dig it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK looking at the tests, it looks like this case is not tested (but I'm triple checking). This set of
if
s says that the data on-disk is unsigned, but_Unsigned
is false which means the user wants signed data in-memory. The tests for_Unsigned="false"
have signed data on-disk and in-memory so no casting/conversion happens.Edit: Scratch that.
test_backends
doesn't test it, buttest_coding
does but never uses_FillValue
. Let's see what I can do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #4966
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I thought I was being smart and added tests for the
_Unsigned: "false"
case and now things are just all weird. It made me realize I wasn't handling this case in the encoding step, but it also kind of seems like it never was handled or that I have the wrong impression of how that configuration is supposed to be handled. The tests added in that PR @kmuehlbauer don't present the_FillValue
so I've tried adding that to the backend tests but now non-NC4 backends are complaining about converting uint8 to int8. This coercion was added to solve #4014 it seems.My assumption is that if
_Unsigned: "false"
then the data is saved as uint8 and_FillValue
should be uint8. But again, I'm not sure why my new tests are even trying to get to int8. I'll do some more testing later tonight hopefully. I'm not sure if it is better to spend a ton of time getting this small functionality working "as expected" or leave it undefined/untested as it is right now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my main concern and all of your handling in the decode pipeline for casting fill values turns out to not be an issue anymore as the raw fill values turn out to be numpy scalars (
np.uint8
) when they get loaded from the file. Or at least they are for the NetCDF4 cases. So numpy is perfectly happy casting uint8 to int8 and back if they are already numpy scalars. If my new tests cases make sense then I think this is fine.