-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce lazy_tree
(super dictionaries)
#1733
Conversation
020add3
to
570c758
Compare
a5a6e86
to
6ed904d
Compare
6ed904d
to
d5a8ad6
Compare
converted to draft until #1733 (comment) is addressed |
b14e530
to
e359dde
Compare
The following branch of |
6fffde9
to
686dd94
Compare
686dd94
to
c10f3af
Compare
JWST regtests: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1571/ romancal regtests: https://github.com/spacetelescope/RegressionTests/actions/runs/9746576095 |
@nden @perrygreenfield the regtests all pass with this PR (except for the 2 jwst tests that randomly and frequently fail). |
If I understand how this works from the description, once I request a quantity array, all quantity arrays are loaded into memory. Is this correct? |
No, there's something else, not sure what. The above comment is true for quantity arrays. For numpy arrays, it works as expected. Loading one array does not load any other arrays. |
Thanks for giving it a try. What file did you use for testing? If it's a roman file things will behave differently if you're using roman_datamodels main vs the "lazy" branch linked above. I think this points to this feature (and PR) needing more documentation. Here's a non-roman example (please let me know if you give it a try and find anything different from the example) it doesn't require any special versions of anything (except for using asdf from the source branch for this PR). import asdf
import numpy as np
import astropy.units as u
# make 5 quantiy arrays
qs = [u.Quantity(np.zeros(3+i) + i, u.m) for i in range(5)]
# save them to an ASDF file
af = asdf.AsdfFile()
af["qs"] = qs
af.write_to("test.asdf")
# open the file with a "lazy_tree"
with asdf.open("test.asdf", lazy_tree=True) as af:
# When opened asdf always reads the first and last block
# (this is true for lazy and non-lazy trees). Since we
# are using a 'lazy_tree' only these blocks will be loaded
# and since these are lazy blocks just the headers will be read.
print("before accessing quantities")
print(f"Loaded blocks: {[b.loaded for b in af._blocks._blocks]}")
# Since we're using a 'lazy_tree' the 'qs' 'list' will be
# a special AsdfListNode object
print(f"'qs' type = {type(af['qs'])}")
# Accessing the first quantity will convert the tagged
# representation to a quantity
print(f"qs[0] = {af['qs'][0]=}")
# but no other blocks will be loaded
print(f"Loaded blocks: {[b.loaded for b in af._blocks._blocks]}")
# Accessing the second quantity will cause a block to load
print(f"qs[1] = {af['qs'][1]=}")
print(f"Loaded blocks: {[b.loaded for b in af._blocks._blocks]}") When I run the example I get the following output:
For the above example the "containers" (list-like Roman files are a bit different because they use im = roman_datamodels.maker_utils.mk_level2_image()
af = asdf.AsdfFile()
af['roman'] = im
af.write_to("roman.asdf") If we load it with >> af = asdf.open("roman.asdf", lazy_tree=True)
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'100000000000001' This is only because we haven't accessed the "roman" key from the lazy >> af["roman"]
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'111100001101111' This is because the converter that deserializes If instead, we use the modified version of roman_datamodels (which sets >> af["roman"]
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'100000000000001' Here accessing just the top level "roman" doesn't trigger asdf to load everything within that sub-tree (since the converter has >> af["roman"]["data"]
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'100000000100001' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but it should note in the documentation that currently use of the info
method on a lazy true forces loading of everything, for now anyway.
remove edge case handling for generator producing converter not caught by inspect. Any generator producing converter should be marked as not lazy.
c10f3af
to
01cf1a6
Compare
Thanks! I updated the Does the updated description sound good? (emphasis added to the new part below, see the commit for the full text and context).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation update LGTM
Description
This PR adds:
lazy_tree
option toAsdfConfig
lazy_tree
argument toasdf.open
(defaults toAsdfConfig.lazy_tree
)Converter.lazy
attribute used to indicate if a converter supports "lazy" objectsasdf.lazy_nodes
"lazy" container classes for list, dict, ordered dictBy default the "lazy" option is
False
.When
lazy_tree
isTrue
and an ASDF file is opened the tagged nodes in the tree are not immediately converted to custom objects. Instead, the containers in the tree (dicts, lists, OrderedDicts) are replaced withAsdfNode
subclasses that act like these containers and convert tagged values to custom objects when they are accessed (See #1705 for discussion of this feature). During conversion, if asdf encounters aConverter
that either defineslazy=False
or does not definelazy
the remainder of the branch will be converted to non-"lazy" objects and passed to theConverter
. If instead theConverter
defineslazy=True
the "lazy" object (ie aAsdfDictNode
for a dict) will be passed to theConverter
.Checklist: