Introduce `lazy_tree` (super dictionaries) #1733

braingram · 2024-01-12T17:06:21Z

Description

This PR adds:

lazy_tree option to AsdfConfig
lazy_tree argument to asdf.open (defaults to AsdfConfig.lazy_tree)
Converter.lazy attribute used to indicate if a converter supports "lazy" objects
asdf.lazy_nodes "lazy" container classes for list, dict, ordered dict

By default the "lazy" option is False.

When lazy_tree is True and an ASDF file is opened the tagged nodes in the tree are not immediately converted to custom objects. Instead, the containers in the tree (dicts, lists, OrderedDicts) are replaced with AsdfNode subclasses that act like these containers and convert tagged values to custom objects when they are accessed (See #1705 for discussion of this feature). During conversion, if asdf encounters a Converter that either defines lazy=False or does not define lazy the remainder of the branch will be converted to non-"lazy" objects and passed to the Converter. If instead the Converter defines lazy=True the "lazy" object (ie a AsdfDictNode for a dict) will be passed to the Converter.

Checklist:

pre-commit checks ran successfully
tests ran successfully
for a public change, a changelog entry was added
for a public change, documentation was updated
for any new features, unit tests were added

asdf/_tests/conftest.py

asdf/_tests/core/_converters/test_complex.py

asdf/_tests/test_lazy_nodes.py

asdf/exceptions.py

asdf/lazy_nodes.py

braingram · 2024-05-07T15:12:02Z

converted to draft until #1733 (comment) is addressed

braingram · 2024-05-13T17:16:31Z

The following branch of roman_datamodels adds lazy to the node converters (and makes some minor node changes to account for AsdfDictNode not passing isinstance(..., dict) etc) to allow lazy loading of roman trees:
https://github.com/spacetelescope/roman_datamodels/compare/main...braingram:roman_datamodels:lazy?expand=1

braingram · 2024-07-01T15:02:31Z

JWST regtests: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1571/
passed with 2 unrelated (and common random) failures

romancal regtests: https://github.com/spacetelescope/RegressionTests/actions/runs/9746576095
(I ran the romancal tests with photutils==1.12.0 since 1.13.0 is currently breaking main: spacetelescope/romancal#1291)
ran with no failures

braingram · 2024-07-01T18:34:00Z

@nden @perrygreenfield the regtests all pass with this PR (except for the 2 jwst tests that randomly and frequently fail).

nden · 2024-07-03T22:02:37Z

If I understand how this works from the description, once I request a quantity array, all quantity arrays are loaded into memory. Is this correct?

nden · 2024-07-03T22:05:43Z

No, there's something else, not sure what. The above comment is true for quantity arrays. For numpy arrays, it works as expected. Loading one array does not load any other arrays.

braingram · 2024-07-04T01:22:38Z

Thanks for giving it a try. What file did you use for testing? If it's a roman file things will behave differently if you're using roman_datamodels main vs the "lazy" branch linked above. I think this points to this feature (and PR) needing more documentation.

Here's a non-roman example (please let me know if you give it a try and find anything different from the example) it doesn't require any special versions of anything (except for using asdf from the source branch for this PR).

import asdf
import numpy as np
import astropy.units as u

# make 5 quantiy arrays
qs = [u.Quantity(np.zeros(3+i) + i, u.m) for i in range(5)]

# save them to an ASDF file
af = asdf.AsdfFile()
af["qs"] = qs
af.write_to("test.asdf")

# open the file with a "lazy_tree"
with asdf.open("test.asdf", lazy_tree=True) as af:
    # When opened asdf always reads the first and last block
    # (this is true for lazy and non-lazy trees). Since we
    # are using a 'lazy_tree' only these blocks will be loaded
    # and since these are lazy blocks just the headers will be read.

    print("before accessing quantities")
    print(f"Loaded blocks: {[b.loaded for b in af._blocks._blocks]}")

    # Since we're using a 'lazy_tree' the 'qs' 'list' will be
    # a special AsdfListNode object
    print(f"'qs' type = {type(af['qs'])}")

    # Accessing the first quantity will convert the tagged
    # representation to a quantity
    print(f"qs[0] = {af['qs'][0]=}")
    # but no other blocks will be loaded
    print(f"Loaded blocks: {[b.loaded for b in af._blocks._blocks]}")

    # Accessing the second quantity will cause a block to load
    print(f"qs[1] = {af['qs'][1]=}")
    print(f"Loaded blocks: {[b.loaded for b in af._blocks._blocks]}")

When I run the example I get the following output:

before accessing quantities
Loaded blocks: [True, False, False, False, True]
'qs' type = <class 'asdf.lazy_nodes.AsdfListNode'>
qs[0] = af['qs'][0]=<Quantity [0., 0., 0.] m>
Loaded blocks: [True, False, False, False, True]
qs[1] = af['qs'][1]=<Quantity [1., 1., 1., 1.] m>
Loaded blocks: [True, True, False, False, True]

For the above example the "containers" (list-like AsdfListNode and dict-like AsdfDictNode) objects in the tree are made "lazy" (since lazy_tree=True) and the contained objects only deserialized when they are accessed. For the above example the index 1 item in the "qs" "list" isn't converted to a quantity until it's accessed with qs[1]. At that time asdf turns the tagged representation for index 1 into a quantity (which triggers loading the index 1 block). For the non-accessed items in the "list" (like qs[2]) they're never converted to a quantity in the above example (so for qs[2] the index 2 block is never loaded).

Roman files are a bit different because they use STNode subclasses for containers. If we create a fake "roman" file:

im = roman_datamodels.maker_utils.mk_level2_image()
af = asdf.AsdfFile()
af['roman'] = im
af.write_to("roman.asdf")

If we load it with asdf.open we'll see the following loaded blocks

>> af = asdf.open("roman.asdf", lazy_tree=True)
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'100000000000001'

This is only because we haven't accessed the "roman" key from the lazy AsdfDictNode. Accessing "roman" (with the main branch of roman_datamodels) results in many blocks being loaded (every block that maps to a quantity):

>> af["roman"]
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'111100001101111'

This is because the converter that deserializes roman_datamodels.stnode.WfiImage doesn't set lazy=True (by default asdf assumes nothing in extensions is lazy). Since the converter for this object isn't lazy asdf will convert everything within the sub-tree that it hands to the converter before calling the converter (so the converter never sees a lazy node, matching the current asdf behavior). Accessing af["roman"] triggers asdf to convert everything within the WfiImage sub-tree.

If instead, we use the modified version of roman_datamodels (which sets lazy=True for the converter handling WfiImage) things are much "lazier".

>> af["roman"]
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'100000000000001'

Here accessing just the top level "roman" doesn't trigger asdf to load everything within that sub-tree (since the converter has lazy=True) but if we access the data 1 block will be loaded (the exact one may differ depending on the order of the blocks).

>> af["roman"]["data"]
>> "".join(["1" if b.loaded else "0" for b in af._blocks._blocks])
'100000000100001'

perrygreenfield

LGTM, but it should note in the documentation that currently use of the info method on a lazy true forces loading of everything, for now anyway.

remove edge case handling for generator producing converter not caught by inspect. Any generator producing converter should be marked as not lazy.

braingram · 2024-07-11T19:27:09Z

Thanks!

I updated the lazy_tree documentation in:
5843ada

Does the updated description sound good? (emphasis added to the new part below, see the commit for the full text and context).

lazy_tree : bool, optional
When True the ASDF tree will not be converted to custom objects
when the file is loaded. Instead, objects will be "lazily" converted
only when they are accessed. Note that the tree will not contain dict
and list instances for containers and instead return instances of classes
defined in asdf.lazy_nodes. Since objects are converted when they
are accessed, traversing the tree (like is done during AsdfFile.info
and AsdfFile.search) will result in nodes being converted.

perrygreenfield

The documentation update LGTM

braingram added the Downstream CI label Jan 12, 2024

braingram force-pushed the feature/lazy_tree branch from 020add3 to 570c758 Compare January 16, 2024 21:36

braingram marked this pull request as ready for review January 17, 2024 16:31

braingram requested a review from a team as a code owner January 17, 2024 16:31

braingram requested review from perrygreenfield, nden and eslavich January 17, 2024 16:31

braingram added this to the 3.1.0 milestone Jan 17, 2024

eslavich reviewed Feb 11, 2024

View reviewed changes

braingram force-pushed the feature/lazy_tree branch from a5a6e86 to 6ed904d Compare February 14, 2024 19:08

braingram force-pushed the feature/lazy_tree branch from 6ed904d to d5a8ad6 Compare February 27, 2024 15:06

braingram modified the milestones: 3.1.0, 3.2.0 Feb 27, 2024

braingram marked this pull request as draft May 7, 2024 15:11

braingram force-pushed the feature/lazy_tree branch 2 times, most recently from b14e530 to e359dde Compare May 13, 2024 15:58

braingram force-pushed the feature/lazy_tree branch from 6fffde9 to 686dd94 Compare May 14, 2024 16:10

braingram marked this pull request as ready for review May 14, 2024 16:39

braingram requested a review from eslavich May 14, 2024 16:39

braingram force-pushed the feature/lazy_tree branch from 686dd94 to c10f3af Compare June 25, 2024 16:32

perrygreenfield reviewed Jul 11, 2024

View reviewed changes

braingram added 2 commits July 11, 2024 15:21

add cache and tests

88bc39b

make lazy_tree public

cb06ebf

braingram added 22 commits July 11, 2024 15:21

turn of lazy_tree as default

2c6154f

add changelog

1f701c0

update converter docs

64ad0fe

add helper function to resolve AsdfFile weakref

4770ab0

add AsdfNode subclass empty init test

5ff7374

add with_lazy_tree fixture, use for several tests

5f88103

add equality test

e4737ba

add _to_lazy_node helper function

4fff6a9

use tagged in tests

68ffce0

refactor __getitem__

bfb8908

refactor _convert

9c1aba5

another small refactor and some docstrings

d94d28f

add test for treeutil support

00573c2

make asdf.lazy_nodes.AsdfNode private

b8d1d07

fixes after rebase

006e6df

drop __class__ override, fix import

b24356c

drop builtin inheritance, add lazy to converter

b582868

add note about AsdfObject inheritance

19c6bcb

use weakref for tagged object cache

2896f87

allow cache to contain non-weakref-able objects

819761b

add test for generator producing converter

374c293

remove edge case handling for generator producing converter not caught by inspect. Any generator producing converter should be marked as not lazy.

update docs for Converter.lazy attribute

01cf1a6

braingram force-pushed the feature/lazy_tree branch from c10f3af to 01cf1a6 Compare July 11, 2024 19:21

add note about node coversion for info and search for lazy_tree

5843ada

perrygreenfield approved these changes Jul 12, 2024

View reviewed changes

perrygreenfield mentioned this pull request Jul 12, 2024

Consider a new design for the info and search methods that avoids conversion of nodes when the lazy_tree option is used #1795

Open

braingram merged commit 9faf968 into asdf-format:main Jul 12, 2024
49 checks passed

braingram deleted the feature/lazy_tree branch July 12, 2024 13:23

stscijgbot-jp mentioned this pull request Sep 18, 2024

Clean up unnecessary copies spacetelescope/jwst#8673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `lazy_tree` (super dictionaries) #1733

Introduce `lazy_tree` (super dictionaries) #1733

braingram commented Jan 12, 2024 •

edited

Loading

braingram commented May 7, 2024

braingram commented May 13, 2024

braingram commented Jul 1, 2024 •

edited

Loading

braingram commented Jul 1, 2024

nden commented Jul 3, 2024

nden commented Jul 3, 2024

braingram commented Jul 4, 2024

perrygreenfield left a comment

braingram commented Jul 11, 2024

perrygreenfield left a comment

Introduce lazy_tree (super dictionaries) #1733

Introduce lazy_tree (super dictionaries) #1733

Conversation

braingram commented Jan 12, 2024 • edited Loading

Description

Checklist:

braingram commented May 7, 2024

braingram commented May 13, 2024

braingram commented Jul 1, 2024 • edited Loading

braingram commented Jul 1, 2024

nden commented Jul 3, 2024

nden commented Jul 3, 2024

braingram commented Jul 4, 2024

perrygreenfield left a comment

Choose a reason for hiding this comment

braingram commented Jul 11, 2024

perrygreenfield left a comment

Choose a reason for hiding this comment

Introduce `lazy_tree` (super dictionaries) #1733

Introduce `lazy_tree` (super dictionaries) #1733

braingram commented Jan 12, 2024 •

edited

Loading

braingram commented Jul 1, 2024 •

edited

Loading