Simplify ModelContainer #8738

stscijgbot-jp · 2024-08-29T18:08:05Z

Issue JP-3721 was created on JIRA by Ned Molter:

With the addition of ModelLibrary, the role of ModelContainer needs to be re-evaluated. ModelLibrary certainly renders the return_open and save_open options obsolete, as well as the get_sections method, since if memory usage is a concern then ModelLibrary should be used instead.

However, during discussions related to JP-3715, it has become increasingly clear that the strictness of ModelLibrary and the additional borrow/shelve code required to access models is not necessary/desired for many use-cases. For example, the calwebb_spec3 pipeline currently makes extensive use of ModelContainer but does not have the same memory issues as calwebb_image3 does. Learning to use ModelLibrary may also be a nuisance for users manipulating relatively small datasets.

Therefore, the need may remain for a lightweight, easy-to-use class for loading a list of models and association metadata. This container should satisfy the following constraints:

Loads association metadata dictionary
Is a Sequence, i.e., can be manipulated as if it were a list
Loads all datamodels into memory (i.e., use ModelLibrary instead if we care about memory usage)
Is the default data structure loaded by datamodels.open() for asn-type data
Can be used as a context manager, i.e., with datamodels.open(asn.json) as container has the expected behavior

The plan for now is to draft this container by substantially slimming down ModelContainer, and see if it satisfies the needs of the JWST pipeline.

More context about ModelContainer issues can be found on this innerspace page: https://innerspace.stsci.edu/display/SCSB/ModelContainer+uses+in+jwst

The text was updated successfully, but these errors were encountered:

stscijgbot-jp · 2024-09-05T16:13:06Z

Comment by Ned Molter on JIRA:

A draft of the slimmed-down new-look ModelContainer can be found here. -I was able to make this work with the rest of the jwst repository and run all the pipelines successfully with only minor modifications.- I'm listing the major changes, along with some questions about what is the desired behavior. I'm curious to hear anyone's feedback.

Removed inheritance from JwstDataModel. The main reason I did so is because the container isn't really itself a model, and this inheritance was mainly used in the pipeline to allow datamodels.open() to operate on containers. A small change to stdatamodels makes this unnecessary. If model metadata is needed, it's usually better/safer to pull this from one of the models in the container rather than the container itself, and indeed that's usually what is done in the pipeline anyway. This change also allowed removal of everything schema-related in the container.
Removed save_open and return_open options in light of ModelLibrary handling memory usage more carefully
Added context management. This was necessary after removing the JWSTDataModel inheritance to make with datamodels.open(asn.json) as container have the expected behavior. The only thing the context management does is close all the models in the container when exiting.
Removed the save method. This is the change I'm least certain about. The old save method would just save all the models in the container to a filename specified according to the way JWSTDataModel figures out default filenames. Saving this way is no longer used in the pipeline after ModelLibrary was added, and philosophically it feels like if the container has a save method, it should make an association, since the main reason for its existence is easy manipulation of asn-type data. If the desired behavior is to save all the models, I see no reason a user can't just loop over the container and do so.

stscijgbot-jp · 2024-09-16T19:13:04Z

Comment by Melanie Clarke on JIRA:

A quick note for discussion later, while we are refactoring ModelContainer...

There are some inconsistencies between how ModelContainer handles relative paths in association file expname values and how stpipe does it. I think it can be traced back to ModelContainer not respecting the input_directory when it reads in files from an association: instead of using something like stpipe.step.make_input_path, it directly prepends the ASN directory (here, in your proposed refactor branch).

This was reported in an old ticket (JP-2038) and closed as won't-fix, but it would be nice to clean it up while we're working on making ModelContainer more usable and stpipe compatible.

stscijgbot-jp · 2024-09-26T17:33:04Z

Comment by Ned Molter on JIRA:

Melanie Clarke does the new test association file in this commit cover the problem that was being discussed? And does it look like the new unit test covers that case adequately? I hope I understood the problem correctly from the original ticket and long back-and-forth on it.

stscijgbot-jp · 2024-09-27T16:13:04Z

Comment by Melanie Clarke on JIRA:

I think not quite. I am testing like this, with input imaging rate files set up in an 'input' directory, and two empty directories, 'img2' and 'img3', at the same level under the current directory:

asn_from_list input/*fits -o img2/img2.json -r DMSLevel2bBase
strun calwebb_image2 img2/img2.json --output_dir=img2

asn_from_list img2/*cal.fits -o img3/img3.json --product-name img3
strun calwebb_image3 img3/img3.json --output_dir=img3

The image2 run succeeds: it assumes the starting directory is the current working directory and finds the input at input/*fits, even though the association is in img2. The image3 run fails, on both main and your branches, because it assumes the starting directory is img3, where the association file is, and looks for the files in img3/img2/*cal.fits.

I was hoping we could just quickly sort out this inconsistency, but it may be too complicated a side issue to be dealt with in this ticket. On further examination, it doesn't look like either pipeline properly supports the input_directory parameter.

stscijgbot-jp added Software Affected: datamodels team-chartreuse labels Aug 29, 2024

emolter linked a pull request Sep 26, 2024 that will close this issue

JP-3721: Simplify ModelContainer #8831

Open

10 tasks

This was referenced Oct 8, 2024

JP-3721: Simplify ModelContainer spacetelescope/stpipe#190

Closed

JP-3721: Simplify ModelContainer spacetelescope/stdatamodels#330

Merged

braingram closed this as completed in spacetelescope/stdatamodels#330 Oct 8, 2024

stscijgbot-jp reopened this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify ModelContainer #8738

Simplify ModelContainer #8738

stscijgbot-jp commented Aug 29, 2024 •

edited

Loading

stscijgbot-jp commented Sep 5, 2024 •

edited

Loading

stscijgbot-jp commented Sep 16, 2024

stscijgbot-jp commented Sep 26, 2024

stscijgbot-jp commented Sep 27, 2024

Simplify ModelContainer #8738

Simplify ModelContainer #8738

Comments

stscijgbot-jp commented Aug 29, 2024 • edited Loading

stscijgbot-jp commented Sep 5, 2024 • edited Loading

stscijgbot-jp commented Sep 16, 2024

stscijgbot-jp commented Sep 26, 2024

stscijgbot-jp commented Sep 27, 2024

stscijgbot-jp commented Aug 29, 2024 •

edited

Loading

stscijgbot-jp commented Sep 5, 2024 •

edited

Loading