-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite code which tags complex objects #223
Rewrite code which tags complex objects #223
Conversation
Part of the validation of an asdf tree against a schema is checking tags in the schema against tags embedded in the data. Function tagged.tag_objects has code supplies these tags, but orignally was only coded to handle dctionaries, lists, and strings. This proved a problem when checking the tags of times in the asdf tree, as times are none of these types. I previously wrote a function specifically to handle times, but his code had two problems. First, it created an extra dependency on the astropy time code and second, the fix was limited to time fields. After becoming more familiar with the code I saw that the functionality for handling more complex data objects was already in asdf, in yamlutil.custom_tree_to_tagged_tree. So I removed the code I had written to handle Time objects and replaced it with that function. That code takes a new argument, ctx, which is an asdf object, preferable the one associated with the asdf tree being validated. It is needed to access any extensions that might be used with the asdf tree. Since this is a new argument, the argument list to tagged.tag_object was modified to include it as an optional argument and code in asdf which invokes tagged.tag_object has been modified to pass it where it is available. If ctx is not passed, the code creates a new asdf object, which means that extensions cannot be used.
Just so I understand, is the current change applicable to Time objects? It would help if there's a failing example (also as a test). |
After metadata is changed through datamodels, the change is validated using the asdf validation code. Part of this validation is making sure that the type of the data matches the type in the schema. These schema types are not (necessarily) the same types as in Python. The code handles this by wrapping the data in a new object with a _tag field containing a string which matches the schema type. The code in asdf/tagged.py is an interface between the tagging code in asdf and datamodels so that data is properly tagged before being validated. Previously time fields failed the validation. This was worked around by changing the schema so that time fields were treated as strings. My previous change to asdf resolved the problem, but because I did not understand the code well enough, I wrote code that only handled time fields rather than the more general code that called the asdf tagging code and handles any tagged field in the schema. The change to the calling sequence is regrettable, but the new argument is needed by yamlutil.custom_tree_to_tagged_tree. The new argument is used to access any non-standard extensions associated with a file, not a standard situation. Because of this and because it is an optional argument, it should not cause problems with the existing code. I have verfied this by running the tests for jwst against the change. |
@bernie-simon Looks like you were editing an old version. |
@@ -460,7 +460,7 @@ def to_tree_tagged(cls, node, ctx): | |||
`to_tree`, allows types to customize how the result is tagged. | |||
""" | |||
obj = cls.to_tree(node, ctx) | |||
return tagged.tag_object(cls.yaml_tag, obj) | |||
return tagged.tag_object(cls.yaml_tag, obj, ctx=ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't obj
already containing ctx
? Does it need to be passed separately and wouldn't this increase memory usage?
obj is the value being tagged. It may already be in the asdf tree, if the asdf tree is being validated, or it may not be in the tree, if a modification to the tree is being validated. In either case, it can't be accessed from ctx without knowing the path from the top of the tree. |
@@ -87,7 +87,7 @@ def _to_tree_from_model_tree(cls, tree, ctx): | |||
except KeyError: | |||
raise ValueError("Unknown operator '{0}'".format(tree.value)) | |||
|
|||
node = tagged.tag_object(cls.make_yaml_tag(tag_name), node) | |||
node = tagged.tag_object(cls.make_yaml_tag(tag_name), node, ctx=ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. The CompoundType
represents a compound model of type astropy.modeling.core.CompoundModel
. In my experience it never fails serialization. Why does it need to be wrapped? I may be wrong but my understanding is that the intention was to tag only python primitive types.
It would really help if there's a failing example which illustrates the problem.
I'm speaking of the validation that happens after the asdf tree is modified in datamodels.properties. That code checks the schema for a tag and if it finds one, it calls the asdf validation code, which looks for a matching tag on the data being modified. Because data coming through datamodels.properties comes from the user and is not wrapped the way data is when walking the asdf tree, the tag is not there and validation fails. Probably you have not seen CompoundModel fail validation is because users do not try to add it through the datamodels.properties interface. |
I added the test case I wrote for time tagged data to the asdf test suite. The file is asdf/tests/test_tagged.py.
I've added a test case for time tagged data -- the test case is in asdf/tests/test_tagged.py |
Part of the validation of an asdf tree against a schema is checking
tags in the schema against tags embedded in the data. Function
tagged.tag_objects has code supplies these tags, but orignally was
only coded to handle dctionaries, lists, and strings. This proved a
problem when checking the tags of times in the asdf tree, as times are
none of these types. I previously wrote a function specifically to
handle times, but his code had two problems. First, it created an
extra dependency on the astropy time code and second, the fix was
limited to time fields.
After becoming more familiar with the code I saw that the
functionality for handling more complex data objects was already in
asdf, in yamlutil.custom_tree_to_tagged_tree. So I removed the code I
had written to handle Time objects and replaced it with that
function. That code takes a new argument, ctx, which is an asdf
object, preferable the one associated with the asdf tree being
validated. It is needed to access any extensions that might be used
with the asdf tree. Since this is a new argument, the argument list to
tagged.tag_object was modified to include it as an optional argument
and code in asdf which invokes tagged.tag_object has been modified to
pass it where it is available. If ctx is not passed, the code creates
a new asdf object, which means that extensions cannot be used.