Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Schema backwards compatibility #272

Merged
merged 30 commits into from
Jul 25, 2017

Conversation

drdavella
Copy link
Contributor

@drdavella drdavella commented Jul 11, 2017

As it stands, this is a fairly naive implementation of backwards compatibility for CelestialFrames. More than anything it's meant to demonstrate how backwards compatibility might be implemented for future tag classes where it is actually required. It's also meant to expose functionality that needs to be added/modified in order to support backwards compatibility. I will continue to update this PR with improvements until it seems mature enough to merge.

This is an attempt to address issue #249, and also may be related to #252.

This is not meant to be the last word for backwards compatibility. In
fact, it's really only the first word. There's a lot more work to be
done and more thought to be given to the mechanisms that are required to
support this kind of behavior. Also, this particular case isn't even all
that critical--it just serves as a potentially useful example.
@drdavella
Copy link
Contributor Author

drdavella commented Jul 11, 2017

Here's a fairly unstructured list of thoughts, questions, considerations, and the like (mostly for my own benefit):

  • Before these changes, no tag class that implemented v1.0.0 of CelestialFrame was available. If a file was read that had an older schema, a warning was issued but then v1.1.0 of CelestialFrame was used as the tag. This didn't work since the v1.1.0 class doesn't know anything about the older data type or how to convert data stored in v1.0.0 to the newest CelestialFrame implementation. It was necessary to make my_version < supported_version raise an exception instead of a warning. It was then necessary to implement a class associated with CelestialFrame that used 1.0.0 as its version.
  • Up until now, differences between minor versions were tolerated. The only indication that there was a mismatch was a warning from fix_yaml_tag in asdftypes.py. However, even a difference in minor versions for tags can mean a non-trivial difference in implementation. Minor version changes indicate that backwards compatibility is still supported, but it doesn't mean the underlying implementation hasn't changed at all. So how do we account for this? Do we change the blanket policy and say that all minor version differences are errors unless explicitly accounted for? Or do we provide a mechanism for marking specific tag classes to indicate that differences in minor version are not tolerated? In the first set of commits here, the policy is to say that my_version < supported_version causes an error unless an implementation for my_version is explicitly provided. But in practice there seems to be no reason for this asymmetry if my_version > supported_version this could also imply differences in implementation.
  • Do we really need a separate tag class for each version of a particular implementation? We could possibly add a field like supported_versions that is a list, and then update the version field to be latest_version or some such. However, this will require some potentially substantial refactoring of the way that tag classes are associated with versions (largely in asdftypes.py).

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 94.206% when pulling 8246d69 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@perrygreenfield
Copy link
Contributor

perrygreenfield commented Jul 12, 2017

I think I will make comments in reverse order.

  • I agree that anything that prevents the proliferation of classes is probably a very good thing. So it sounds like the proposed solution of supported versions is better. But this may also be addressed by the comment to the middle item, next.

  • Would it be a problem to make backward compatibility a necessary requirement for any particular tag? New versions may implement new features, attributes, etc, but a library implementation should always be able to handle all older versions. If that is required, are the points made a problem still? For example, a version of the library that supports version Y runs into an older version X of the schema. It can handle that. Suppose an older version of the library that could handle version X but reads a file with version Y of the schema; what happens? Good question. For it to be able to handle minor differences that don't require changes in implementation (are there examples of this? If is a new version, one thinks some change in implementation is needed). In this case I would propose a warning ("you really should read this file with a newer version of the library"), and reading it as a basic Python data structure. But I may be missing a key element in this so give me an example where these approaches don't work.

[Added comment] I guess one case that may happen frequently, a new version of the schema supports some special case. One encounters a file made under the new schema, but it actually doesn't use the new features of that schema. In that case, one can try to read that element with the schema the library supports, and if it reads without error, then that's ok (or is it? I wonder if there are schemas that permit arbitrary attributes, and then a new attribute is added and it ignores it when that actually changes the meaning; is that possible?)]

  • I'm not clear on the first item. OK, it previously didn't work. I'm unclear of what the final situation is though. So the latest version of the library has a CelestialFrame schema/version incompatible with the older schema. Is it possible to trap the exception and check for the special case ("I wonder if it uses the older form for the Frame, and I will try to construct it that way"). Yes, this violates the schema formalism on some level (but practicality beats purity sometimes). We would strive to avoid situations like this in the future if possible. Again, I may be missing something.

This was necessary since the obsgeovel and obsgeoloc fields were
affected by the schema version change but are not used in Galactocentric
frames, so were not previously being tested.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 94.276% when pulling 604ed9a on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@drdavella
Copy link
Contributor Author

I will respond in pseudorandom order:

  • The point I failed to make in the first bullet is that ASDF did not recognize that it (at that time) did not support an older version of the CelestialFrame schema. It simply assumed that it could map the older, unsupported version to the newer tag class, which was not true in this case. The first step I took was to modify the logic that ASDF uses to recognize compatible versions. This caused a hard failure when reading the older schema, which is what I wanted. I was then able to implement a new tag class specifically for the older schema, so now everything "works".
  • I think I agree that backwards compatibility, at least within the same major version series, should be a requirement for all tag classes. The question at issue, however, is how exactly to implement that backwards compatibility. So far in this branch, backwards compatibility is supported by providing multiple tag classes, with one corresponding to each supported version of the schema. However, I've argued, and it sounds like you agree, that this might not be the best solution in the long term. Again, the point I'm trying to make is that ASDF doesn't have a general mechanism or coherent strategy for handling backwards compatibility. There are some pieces in place, and I'm trying to sort out the rest in this PR. The CelestialFrame case is serving as a useful example.
  • I might be saying the same thing over and over, but we need to figure out how ASDF knows that it supports a particular schema version. It's one thing to say that it should support backwards compatibility, but there still needs to be a mechanism in place for the library to recognize whether or not it supports a particular version.
  • What happens if my library supports v1.1.0 of a particular schema, but I read a file with v1.2.0 of that schema? Should that cause a failure? A warning? I'm not really sure.

@perrygreenfield
Copy link
Contributor

I think I agree that backwards compatibility, at least within the same major version series, should be a requirement for all tag classes. The question at issue, however, is how exactly to implement that backwards compatibility. So far in this branch, backwards compatibility is supported by providing multiple tag classes, with one corresponding to each supported version of the schema. However, I've argued, and it sounds like you agree, that this might not be the best solution in the long term. Again, the point I'm trying to make is that ASDF doesn't have a general mechanism or coherent strategy for handling backwards compatibility. There are some pieces in place, and I'm trying to sort out the rest in this PR. The CelestialFrame case is serving as a useful example.

Yes, I agree that this isn't the best long-term approach and that we will need to do work to avoid multiple tag classes.

I might be saying the same thing over and over, but we need to figure out how ASDF knows that it supports a particular schema version. It's one thing to say that it should support backwards compatibility, but there still needs to be a mechanism in place for the library to recognize whether or not it supports a particular version.
What happens if my library supports v1.1.0 of a particular schema, but I read a file with v1.2.0 of that schema? Should that cause a failure? A warning? I'm not really sure.

I think we should never have a hard failure (i.e., untrapped exception) reading an ASDF file so long as the yaml is well formed. In such a case we issue a warning (at most) and read the unhandled schema tag as a basic python data structure. If backward compatibility is always adhered to, then the main issue is how to handle versions ahead of what the library supports (right?). What I'm suggesting (I don't know if it is practical) is to try reading it with the latest schema that it supports, and if it works, fine (warning about the version), and if not, read it in as a python structure (though if nested items have supported tags, e.g., numpy arrays, they do get handled).

I'm not even sure I would agree that a major version change breaks backward compatibility. The basic principle here is that a tag means something that has an invariant aspect to it. Future versions may change how it may be represented, or enhanced. But if we can't represent the old version in the new (by conversion), it's a different entity and should be named differently (and not just by version numbers). I'm open to see good counterexamples (these may have to come from a wider circle)

@drdavella
Copy link
Contributor Author

I think we should never have a hard failure (i.e., untrapped exception) reading an ASDF file so long as the yaml is well formed. In such a case we issue a warning (at most) and read the unhandled schema tag as a basic python data structure.

After some more investigation, it actually looks like this much is already supported (assuming it does not try and fail to map it to another version of the same type). I will add some unit tests to confirm.

What I'm suggesting (I don't know if it is practical) is to try reading it with the latest schema that it supports, and if it works, fine (warning about the version), and if not, read it in as a python structure (though if nested items have supported tags, e.g., numpy arrays, they do get handled).

This seems like a good strategy in all cases. I'll do some more digging to see whether this works yet or not (I suspect it doesn't yet but I might be wrong).

Still tbd is the best way to accommodate explicit implementations of multiple versions of the same schema. I'll play around more with the ideas proposed above.

ASDF should not fail to read a file, even if the file makes reference to
schema tags that are not defined. In these cases, ASDF should simply
provide a raw Python object that represents the deserialized object.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 94.276% when pulling 3d6c272 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

This is an important case that is not currently handled properly.
Currently if an exact version match is not found for a schema version,
the "best match" is used. However, the tag implementation for this "best
match" may not be compatible with the given schema version, which will
result in an unhandled error. This needs to be handled properly.
This will allow us to isolate the error that is caused when an
incompatible tag class is used to implement a mismatched schema
version and reflects the correct behavior in the long run.
All classes that inherit AsdfType are automatically added to the
internal list of built-in types. In this case, adding a type that was
defined within a unit test to the list of built-ins caused problems for
the schema validation tests. This commit is really just a work-around
for now. In the long term, there should be another class that can be
inherited by both built-in and user-defined types.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 94.297% when pulling 2d2632e on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 94.29% when pulling 5d85aef on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

This seems like the most logical place to make a decision about whether
to write a custom data structure or possible or just return a raw Python
object. Added a new method to AsdfTypes `version_is_supported` to
determine if a particular tag version can be handled by the class. It's
mostly just a stub for now but logic will be fleshed out soon.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 94.296% when pulling cadb129 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

This update allows extension types to be differentiated between those
that ASDF provides as part of the library, and those that are
user-defined. The ASDF types are automatically added to the list of
built-in extensions used by the library, whereas user-defined types are
not.
@drdavella
Copy link
Contributor Author

The latest commit addresses issue #276 by updating the extension type hierarchy. This will be a prerequisite for a clean and general way of determining schema versions that are supported by a particular tag class.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 94.279% when pulling e269f41 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 94.279% when pulling bb39b80 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 94.084% when pulling 2d49117 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

This update allows a single extension class to support multiple versions
of the same schema tag. There might be a better way to do it in the long
run, but this way appears to work, at least for built-in classes. It
would require a different mechanism to be used for user-defined classes,
which is problematic.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.009%) to 94.298% when pulling d0b49ee on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.009%) to 94.298% when pulling 60cd2f5 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.009%) to 94.298% when pulling 37b1037 on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

In order to correctly implement multiple supported versions of tag
classes for all extensions and not just ASDF built-ins, the logic for
adding versioned extension classes was moved from the AsdfType metaclass
constructor to the constructor for AsdfExtensionList. This allows both
AsdfType and UserType sibling version classes to be properly handled.
The sibling version classes themselves are now created within the
ExtensionType constructor and are stored as property of the main version
class.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.001%) to 94.309% when pulling 2b90d9e on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

Copy link
Contributor

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. Only specific comment is on how version support is specified and timing issues.

@@ -417,6 +454,11 @@ class AsdfType(object):
version : 3-tuple of int
The version of the standard the type is defined in.

supported_versions : set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to indicate a range of versions supported? I suspect this may be a common case and it would be more painful to have to explicitly list each version that is supported. Also, it may be good to outline version synchronization issues within the library. I.e., is it possible that schemas get updated versions before the library is updated to deal with them? If so, what happens? Is there a way to indicate that we presume that the library supports future versions without out listing them explicitly (e.g., perhaps indicating a range of version where the end range is indefinite). By default the behavior may be that it tries to handle these newer versions and warns that it failed. I wonder if we have considered all the timing issues, but I may be all wet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of supporting version ranges occurred to me as well, and I think it's probably the right thing to do. I also think it makes sense to address it in a separate issue after getting this PR merged.

Likewise, I think it makes sense to allow for indefinite ranges. None of this should be too difficult to support; it will just require some modifications to the way these things are represented internally. Again, I'd like to make a separate PR for these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should be in a different PR

@coveralls
Copy link

Coverage Status

Coverage increased (+0.001%) to 94.309% when pulling a813f6c on drdavella:compatible-schema into 2e62213 on spacetelescope:master.

@drdavella drdavella merged commit fd5ea50 into asdf-format:master Jul 25, 2017
@drdavella drdavella deleted the compatible-schema branch July 25, 2017 19:05
drdavella added a commit that referenced this pull request Jul 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants