-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option to control base array saving #1753
Conversation
a512cd1
to
abe1346
Compare
5d81502
to
ce24d78
Compare
@@ -732,6 +732,41 @@ def get_array_compression_kwargs(self, arr): | |||
""" """ | |||
return self._blocks._get_array_compression_kwargs(arr) | |||
|
|||
def set_array_save_base(self, arr, save_base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to configure this per file object? To me this implies that the files each have some "base-array-ness" that writes with the file and is recovered when the file is read, but that's not the case, right?
If it is more of an environment-level config item, then the AsdfConfig property might be enough. We could rename that from default_array_save_base
to array_save_base
and ask users to write code like this if they don't want to change the value for their whole environment:
with asdf.config_context() as config:
config.array_save_base = False
af.write_to("foo.asdf")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this option does not get explicitly written to the file (similar to the array_compression_kwargs
). However the setting does "roundtrip" because disables array_save_base
on a view will cause it to get a new block (and then read in as a non-view array).
For a tree with let's say 3 arrays:
big_array
view_of_big_array
other_array
Writing this tree (with the defaults) will create:
- 1 block for
big
andview
- 1 block for
other
When read back in view
will become a view and rewriting the file will produce the same result.
If however the user chooses to call set_array_save_base(view, False)
the file will write out with 3 blocks
- 1 block for
big
- 1 block for
view
- 1 block for
other When this file is read back in the user will get a tree where
viewis no longer a view (due to the
set_array_save_base` call) and writing it back out will again produce a file with 3 blocks.
This only works because our current default is True
. If we change the default to False
than "viewness" will not be preserved and this setting would have to be re-set every time a file is written.
The thought behind making this per-object was to allow a user to have more flexibility. For example, let's say the tree contains:
wcs
: wcs with possibly many arrays, some might be views of non-tree arrays and we don't want these to be viewssci
: science data, a big non-view arraythumbnail
: cutout of science data that we want to keep as a view of "science"ref
: reference pixels from the science data, where for record-keeping reasons I want this to be a non-view on write
If we were to provide only an AsdfConfig
option the user would have to copy ref
and wcs
to keep thumbnail
as a view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good, thanks
asdf/_core/_converters/ndarray.py
Outdated
# The view computations that follow assume that the base array | ||
# is contiguous. If not, we need to make a copy to avoid | ||
# writing a nonsense view. | ||
if options.save_base or (options.save_base is None and cfg.default_array_save_base): | ||
base = util.get_array_base(data) | ||
else: | ||
base = data | ||
if not base.flags.forc: | ||
data = np.ascontiguousarray(data) | ||
base = util.get_array_base(data) | ||
|
||
shape = data.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did moving this fix a problem? The external block handling code used to operate on the data array after it was forced into contiguousness, but now it runs before. That could be enough of a niche of a niche that it doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this fixes a niche problem (and another one could easily be fixed, move on that below). If with the current main branch we try to save a non-"forc" array it will ignore storage options
import asdf, numpy as np
arr = np.zeros((4, 4))
arr.stries = (4, 4) # make this non-"forc" without making it a view
af = asdf.AsdfFile({'arr': arr})
af.set_array_storage(arr, 'inline')
af.write_to('foo.asdf') # foo.asdf will put arr in an internal (not inline) block
With this PR the options will be respected.
If we similarly move the check for 0 strides we could fix the issue for those arrays as well. I'll update this PR.
9be9d87
to
0f2553a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR adds:
AsdfConfig.default_array_save_base
AsdfFile.set_array_save_base
(andAsdfFile.get_array_save_base
)SerializationContext.set_array_base
(andSerializationContext.get_array_save_base
)that control if asdf will save the "base" array when an array view is saved.
default_array_save_base
was used here instead ofall_array_save_base
as a step towards #1503Fixes #1669 #1005
Checklist: