Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to control base array saving #1753

Merged
merged 4 commits into from
May 10, 2024

Conversation

braingram
Copy link
Contributor

@braingram braingram commented Feb 12, 2024

Description

This PR adds:

  • AsdfConfig.default_array_save_base
  • AsdfFile.set_array_save_base (and AsdfFile.get_array_save_base)
  • SerializationContext.set_array_base (and SerializationContext.get_array_save_base)

that control if asdf will save the "base" array when an array view is saved.

default_array_save_base was used here instead of all_array_save_base as a step towards #1503

Fixes #1669 #1005

Checklist:

  • pre-commit checks ran successfully
  • tests ran successfully
  • for a public change, a changelog entry was added
  • for a public change, documentation was updated
  • for any new features, unit tests were added

@@ -732,6 +732,41 @@ def get_array_compression_kwargs(self, arr):
""" """
return self._blocks._get_array_compression_kwargs(arr)

def set_array_save_base(self, arr, save_base):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to configure this per file object? To me this implies that the files each have some "base-array-ness" that writes with the file and is recovered when the file is read, but that's not the case, right?

If it is more of an environment-level config item, then the AsdfConfig property might be enough. We could rename that from default_array_save_base to array_save_base and ask users to write code like this if they don't want to change the value for their whole environment:

with asdf.config_context() as config:
  config.array_save_base = False
  af.write_to("foo.asdf")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this option does not get explicitly written to the file (similar to the array_compression_kwargs). However the setting does "roundtrip" because disables array_save_base on a view will cause it to get a new block (and then read in as a non-view array).

For a tree with let's say 3 arrays:

  • big_array
  • view_of_big_array
  • other_array

Writing this tree (with the defaults) will create:

  • 1 block for big and view
  • 1 block for other

When read back in view will become a view and rewriting the file will produce the same result.

If however the user chooses to call set_array_save_base(view, False) the file will write out with 3 blocks

  • 1 block for big
  • 1 block for view
  • 1 block for other When this file is read back in the user will get a tree where viewis no longer a view (due to theset_array_save_base` call) and writing it back out will again produce a file with 3 blocks.

This only works because our current default is True. If we change the default to False than "viewness" will not be preserved and this setting would have to be re-set every time a file is written.

The thought behind making this per-object was to allow a user to have more flexibility. For example, let's say the tree contains:

  • wcs: wcs with possibly many arrays, some might be views of non-tree arrays and we don't want these to be views
  • sci: science data, a big non-view array
  • thumbnail: cutout of science data that we want to keep as a view of "science"
  • ref: reference pixels from the science data, where for record-keeping reasons I want this to be a non-view on write

If we were to provide only an AsdfConfig option the user would have to copy ref and wcs to keep thumbnail as a view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good, thanks

Comment on lines 59 to 73
# The view computations that follow assume that the base array
# is contiguous. If not, we need to make a copy to avoid
# writing a nonsense view.
if options.save_base or (options.save_base is None and cfg.default_array_save_base):
base = util.get_array_base(data)
else:
base = data
if not base.flags.forc:
data = np.ascontiguousarray(data)
base = util.get_array_base(data)

shape = data.shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did moving this fix a problem? The external block handling code used to operate on the data array after it was forced into contiguousness, but now it runs before. That could be enough of a niche of a niche that it doesn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this fixes a niche problem (and another one could easily be fixed, move on that below). If with the current main branch we try to save a non-"forc" array it will ignore storage options

import asdf, numpy as np
arr = np.zeros((4, 4))
arr.stries = (4, 4)  # make this non-"forc" without making it a view
af = asdf.AsdfFile({'arr': arr})
af.set_array_storage(arr, 'inline')
af.write_to('foo.asdf')  # foo.asdf will put arr in an internal (not inline) block

With this PR the options will be respected.

If we similarly move the check for 0 strides we could fix the issue for those arrays as well. I'll update this PR.

Copy link
Contributor

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@braingram braingram merged commit 3693386 into asdf-format:main May 10, 2024
33 checks passed
@braingram braingram deleted the all_your_base branch May 10, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Saving views results in saving the base array
3 participants