Schema Downgrading System #1387

ssteinbach · 2022-08-25T23:27:47Z

Overview

As OpenTimelineIO makes its way into vendor tools, it is likely that multiple versions of OTIO will exist in different applications that would like to interoperate bidirectionally within a workflow. Therefore, we are adding a system that enables a newer version of the OpenTimelineIO library to write files with schemas that are compatible with older versions of the library.

C++

Introduces the concept of a schema_version_map, a mapping of schema names to desired schema versions
Adds the CORE_VERSION_MAP, a compiled in constant mapping of "label" to schema_version_map. Labels are currently associated with releases of the OpenTImelineIO library (ie. "0.15.0"). Intended to describe sets of schema versions that are compatible with specific releases of OpenTimelineIO.
Adds register_downgrade_function to pair with register_upgrade_function. Like upgrade functions, downgrade functions take an AnyDictionary and operate on it in place.
Adds schema_version_targets arguments to to_json_string and to_json_file family of functions/methods. These are optional<schema_version_map> and allow the user to optionally provide a set of schema version targets for downgrading during serialization
during serialization, if a downgrade needs to happen, uses the CloningEncoder to build an AnyDictionary of the object, which is then passed through the downgrade functions and into the serializer.
Adds type_version_map method to TypeRegistry for querying the schema names and versions of all currently registered types
Adds the io_perf_test and upgrade_downgrade_example example C++ programs
Adds some python-like convenience methods to AnyDictionary: get_default, set_default and has_key
use std::unordered_map instead of std::map inside the writer for a number of internal data structures where order is not relevant
when converting built-in but schema'd types to AnyDictionary, like RationalTime, don't store them as concrete types but store them as AnyDictionary. Otherwise they can't be downgraded.
add override tagging to Encoder/Writer virtual methods
serialization performance improvements when not downgrading. Avoided copies, const& etc. 20mb OTIO serializes with no downgrading in 0.03s (was 0.04s).

Python

Adds python bindings for registering downgrade functions, exposed via the downgrade_function_from decorator, to echo the upgrade_function_for function.
Adds python bindings for querying the full schema/version map and CORE_VERSION_MAP
Adds version_manifest field to the plugin manifest system, allowing you to define custom family/label/schema version sets to target
Adds OTIO_DEFAULT_TARGET_VERSION_FAMILY_LABEL environment variable for telling the python API that you want to use that as a default downgrade target.
Adds autogen_version_map python script that generates the CORE_VERSION_MAP.cpp file for the c++ api

Performance

There is a performance cost to using the downgrading system. On a 20mb OTIO file where all clips are downgraded, the difference in serialization time was a slowdown of about 4x (0.03s -> 0.12s with a release build on an M1 Mac). Deserialization is not impacted at all.

As noted above, the baseline serialization performance was also improved, even when not downgrading.

Performance Scaling With File Size

all times in seconds
dg = with downgrading
→str = to string
→file = to file

OTIO size	read	→str	dg→str	→file	dg→file
20mb	0.18	0.04	0.14	0.14	0.25
86mb	0.77	0.15	0.59	0.64	1.02
257mb	2.35	0.47	1.75	1.83	3.07
1.0gb	9.72	1.85	7.06	7.71	12.3

Follow up issues

Have a discussion about making the registration functions block double registration: Python API: should double type/[up|down]grade function registration raise an exception #1391
Adding bread crumbing support to the upgrade/downgrade functions: Upgrade/Downgrade breadcrumbs #1392

Further Design Discussions

In the current system, if adding a new attribute to a schema and the new attribute's default value is the correct one with no further questions if it isn't present, then we don't add an upgrade function and do not increment the schema version (for example: the enabled flag). For downgrading, its true that the parser ignores extra keys, so we could leave enabled in objects and read them in old versions, however if someone goes in and edits the value of enabled the old APIs won't know what to do with them and it is possible to create data that would be interpreted differently in the old system (which neither models nor reads the .enabled flag and therefore assumes everything is enabled) and the newer system, which would read that as being enabled false.
The existing system (in particular the unit test system, but the plugin system should be tested) is built around the fact that if you double register something, it is a no-op instead of an error. Long term we should probably address this, but it might be a bit of a retrofit for the TypeRegistry architecture
Right now the core schemas are everything that is present just in the C++ core, and the version is tied to the release of the software library. We probably want to separate the two versions of things and establish an "Otio interoperability version" which is a set of schema versions, independent from the versions of the library, and also discuss how schemas are included or not in that list.
Should examples be compiled by default? Right now they are not. It could help to ensure that they stay current. Maybe even better than that would be to move forward with the discussed-previously idea of splitting the C++ and python projects up and setting the C++ library up with its own unit test suite.
Should we offer a mode where the schema/up/downgrade functions are internal only and not exposed as a public interface? Would vendors want something like that?

examples/io_perf_test.cpp

src/opentimelineio/typeRegistry.cpp

src/py-opentimelineio/opentimelineio-bindings/otio_bindings.cpp

src/py-opentimelineio/opentimelineio/core/__init__.py

src/py-opentimelineio/opentimelineio/adapters/otio_json.py

tests/test_serializable_object.py

codecov-commenter · 2022-09-04T05:44:05Z

Codecov Report

Merging #1387 (d97241b) into main (5b418db) will decrease coverage by 0.19%.
The diff coverage is 78.04%.

@@            Coverage Diff             @@
##             main    #1387      +/-   ##
==========================================
- Coverage   86.27%   86.08%   -0.20%     
==========================================
  Files         196      199       +3     
  Lines       19871    20252     +381     
  Branches     2309     2333      +24     
==========================================
+ Hits        17144    17434     +290     
- Misses       2161     2246      +85     
- Partials      566      572       +6

Flag	Coverage Δ
py-unittests	`86.08% <78.04%> (-0.20%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/opentimelineio/serializableObject.cpp	`62.06% <0.00%> (-1.47%)`	⬇️
src/opentimelineio/typeRegistry.h	`100.00% <ø> (ø)`
...entimelineio-bindings/otio_serializableObjects.cpp	`91.57% <0.00%> (ø)`
src/py-opentimelineio/opentimelineio/__init__.py	`100.00% <ø> (ø)`
...neio/opentimelineio/console/autogen_version_map.py	`25.64% <25.64%> (ø)`
tests/test_adapter_plugin.py	`86.84% <50.00%> (-0.58%)`	⬇️
src/opentimelineio/typeRegistry.cpp	`76.59% <52.00%> (-5.76%)`	⬇️
src/opentimelineio/serialization.cpp	`80.17% <74.67%> (-2.41%)`	⬇️
src/py-opentimelineio/opentimelineio/versioning.py	`83.33% <83.33%> (ø)`
tests/test_serializable_object.py	`91.87% <83.82%> (-6.44%)`	⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b418db...d97241b. Read the comment docs.

src/py-opentimelineio/opentimelineio/adapters/otio_json.py

src/opentimelineio/serializableObject.cpp

src/opentimelineio/anyDictionary.h

src/opentimelineio/typeRegistry.cpp

src/opentimelineio/anyDictionary.h

docs/tutorials/versioning-schemas.md

src/opentimelineio/typeRegistry.h

tests/test_serializable_object.py

examples/io_perf_test.cpp

examples/upgrade_downgrade_example.cpp

src/opentimelineio/typeRegistry.h

darbyjohnston · 2022-09-07T18:52:48Z

I'm not qualified to review the schema logic, just left a couple notes on the C++ code.

From your list of questions:

Performance

Is it bad? Have you tried profiling it?

discussed-previously idea of splitting the C++ and python projects

I like that idea.

ssteinbach · 2022-09-07T22:47:19Z

Yes, currently performance is about a 10x slowdown when downgrading. I have some leads on improving it though, so stay tuned for later in the week!

docs/tutorials/versioning-schemas.md

src/py-opentimelineio/opentimelineio/adapters/otio_json.py

src/py-opentimelineio/opentimelineio/core/__init__.py

src/py-opentimelineio/opentimelineio/versioning.py

src/opentimelineio/serialization.cpp

src/py-opentimelineio/opentimelineio/core/__init__.py

tests/test_builtin_adapters.py

Signed-off-by: ssteinbach <[email protected]>

- get_default, set_default, has_key Signed-off-by: ssteinbach <[email protected]>

Signed-off-by: ssteinbach <[email protected]>

rogernelson

I like the approach! A few c++ comments here and there, mostly minor.

examples/io_perf_test.cpp

src/opentimelineio/CORE_VERSION_MAP.cpp

src/opentimelineio/anyDictionary.h

src/opentimelineio/typeRegistry.cpp

src/py-opentimelineio/opentimelineio-bindings/otio_bindings.cpp

src/py-opentimelineio/opentimelineio/core/__init__.py

- cache the child cloning encoder across the entire serialization - const & sprinkled in - remove dead code

ssteinbach · 2022-09-10T15:45:03Z

I added a performance scaling section to the pr text, just noting here in case folks following it are interested.

* refactor versioning tests into their own class * DRY cleanup in the serializer before other stuff * DRY reduction in the json FILE serializer * Add io_perf_test to repo * Add a call w/ downgrade manifest to io_perf_test * add anydictionary convenience functions * add .cache to gitignore * add override tags * add perf tests for no-downgrade scenarios * autogen version info struct * add exceptions for overwriting up/downgrade fn * add exception when double registering a type * move schema version types into typeRegistry * add version manifest plugin * lint pass * comment formatting for RTD * add upgrade_downgrade_example in C++ * Add notes to environment variables markdown. * Improve error handling and text for env var errors - the OTIO_DEFAULT_TARGET_VERSION_FAMILY_LABEL has checking to make sure the format is correct and that the version/label requested are present. - Adds a custom exception that gets raised if there is a problem - Adds a unit test for testing this behavior * Performance tuning (cache the child cloning encoder across the entire serialization, use const & as much as possible) Signed-off-by: ssteinbach <[email protected]> Co-authored-by: meshula <[email protected]> Signed-off-by: Michele Spina <[email protected]>

ssteinbach added this to the Public Beta 15 milestone Aug 25, 2022

ssteinbach mentioned this pull request Aug 29, 2022

Backwards and forwards compatibility #1295

Closed

meshula reviewed Sep 2, 2022

View reviewed changes

examples/io_perf_test.cpp Outdated Show resolved Hide resolved

rogernelson reviewed Sep 2, 2022

View reviewed changes

src/opentimelineio/typeRegistry.cpp Outdated Show resolved Hide resolved

JeanChristopheMorinPerso reviewed Sep 3, 2022

View reviewed changes

src/py-opentimelineio/opentimelineio-bindings/otio_bindings.cpp Show resolved Hide resolved

src/py-opentimelineio/opentimelineio/core/__init__.py Outdated Show resolved Hide resolved

src/py-opentimelineio/opentimelineio/adapters/otio_json.py Outdated Show resolved Hide resolved

JeanChristopheMorinPerso reviewed Sep 3, 2022

View reviewed changes

tests/test_serializable_object.py Outdated Show resolved Hide resolved

JeanChristopheMorinPerso reviewed Sep 4, 2022

View reviewed changes

src/py-opentimelineio/opentimelineio/adapters/otio_json.py Outdated Show resolved Hide resolved