Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate compression method detection #1616

Conversation

pleprince
Copy link
Contributor

This change lets cmake generate ImfCompression.h and ImfCompressor.cpp based on metadata found in each compressor's header (i.e. ImfRleCompression.h). One header can declare multiple compressors.

  • Additional metadata has been added to all compression headers and new compression methods should be able to leverage this mecanism easily.
  • The binaries have been updated to leverage the new system. For example, new compression methods are automatically recognized by exrmaketiled and displayed in the command-line help.

This is a pure cmake implementation (no additional dependency).

@pleprince
Copy link
Contributor Author

I left this PR as draft, as people tend to have strong views about building stuff.

@peterhillman
Copy link
Contributor

That's really neat! Those functions that detail the compression types are going to be really useful. The automatic generation certainly simplifies adding a compression type, but that doesn't happen very frequently.
When Cmake parses the compression IDs, it detects duplicate IDs but should it also detect gaps? E.g. if there's no compression type 12 but there is a 13? I think user code assumes that the compression codes are contiguous (for example when populating menus of compression types) and some of the tests in the library do the same. Also NUM_COMPRESSION_METHODS would be wrong/misleading if there was a break. If they are contiguous, IdToDesc can be a vector<CompressionDesc> instead of a map.

I wonder whether it might be worth moving CompressionDesc, IdToDesc and CompressionNameToId to ImfCompressor.cpp, i.e. make them non-exported private symbols rather than part of the public API? The implementation of the new methods would also become private in ImfCompressor.cpp, rather than inline in ImfCompression.h. That would allow for changes to the CompressionDesc structure etc without breaking the ABI. (For example if GPU-compatible codecs are added in the future it would be good to identify which ones are supported)

@pleprince
Copy link
Contributor Author

Thanks for all the suggestions @peterhillman !
I added sanity checking for gaps in the id list: it will make the build fail.
Next, I will look into your second suggestion.

I also noticed two CI problems:

  • The SNYK step is failing. No idea how to resolve that.
  • The bazel build is failing, which suggest I will have to implement the same mecanism in bazel ? Maybe @Vertexwahn can advise ?

See CMakeLists.txt for details.

Signed-off-by: Philippe Leprince <[email protected]>
Signed-off-by: Philippe Leprince <[email protected]>
Signed-off-by: Philippe Leprince <[email protected]>
suggested by @peterhillman.

Signed-off-by: Philippe Leprince <[email protected]>
@pleprince pleprince force-pushed the Automate-compression-method-detection branch from 5e93f23 to 175efde Compare January 19, 2024 11:38
@pleprince
Copy link
Contributor Author

@peterhillman Shouldn't I also generate src/lib/OpenEXR/ImfCRgbaFile.h ? I am not familiar with it but it looks like a subset of the c++ implementation...

Signed-off-by: Philippe Leprince <[email protected]>
@pleprince
Copy link
Contributor Author

If they are contiguous, IdToDesc can be a vector<CompressionDesc> instead of a map.

Not really: we are using the map to quickly query a compression method's specs in getCompressionNameFromId(), getCompressionDescriptionFromId(), isValidCompressionId(), getCompressionNamesString(), getCompressionNumScanlines(), isLossyCompressionId() and isDeepCompressionId().

@peterhillman
Copy link
Contributor

@pleprince yes, you did need to generate ImfCRgbaFile too, since that's a pure C wrapper so can't include the C++ files. Speaking of C, the OpenEXRCore implementation would still have hard-coded compression types. If/when the C++ library is reimplemented on top of OpenEXRCore, then all this automatic compression method stuff would disappear, unless it was reimplemented in OpenEXRCore. Doing a similar thing with CMake parsing files also seems possible for OpenEXRCore, but would be easier if there was some refactoring to centralize all compression logic into a single file, to limit the amount of files that CMake has to mess with at configure time. Maybe @kdt3rd has thoughts about that?

With IdToDesc, I understood that it's a map indexed by an enum. I was thinking you could make a vector of CompressionDescs, and index it by casting Compression to an unsigned int n, checking n is less than NUM_COMPRESSION_METHODS, then reading IdToDesc[n]. In fact, IdToDesc can be a plain const array, rather than a stl vector, which should be even less overhead.

And regarding bazel and other build systems, it isn't necessary that the files are generated every time CMake is run, since they should be the same for everyone who is building the library. Is there a way to arrange it so library developers can manually run the file generation step using CMake whenever they modify the compression code, and commit the resultant files into the repository? Then only a CMake implementation would be required, no matter what build system is used elsewhere.

@Vertexwahn
Copy link
Contributor

@pleprince

I wonder if the same thing could not be achieved with pure C++ template or macro magic? For instance Mitsuba has a nice MI_EXPORT_PLUGIN macro. We could have some kind of EXR_REGISTER_COMPRESSION macro. I would really prefer a solution that is build system independent. Look for instance at OpenEXRCore/openexr_version.h. In the old days CMake defined the version and generated a version.h file. Now it`s the other way around - we define the version in the C++ files and CMake has to read it.
That makes it easier to deal with different build systems (think about make, Autotools, SCons, Premake, Ninja, Meson, FASTbuild, Buck2, ...)
Reading the configuration from C++ comments via CMake to generate a C++ header feels awkward. I advice you to rethink this and not accept the change in the current form.

If you see code generation as the only option I would go with Python (or C++) (there are IDEs with nice debugging & refactoring support for Python - this is not the case for CMake). We could then run the Python script to update the compressors. The Python script could also be called from a CMake build or a Bazel build, e.g.:

py_binary(
    name = "my_code_generator",
    srcs = ["my_code_gen.py"],
)

Regarding the Bazel integration - would go with Python here:

py_binary(
    name = "generate_ImfCompression_h",
    srcs = ["generate_ImfCompression_h.py"],
)

genrule(
    name = "generate_ImfCompression_h_gen",
    outs = ["ImfCompression.h"],
    cmd = "./$(location //:generate_ImfCompression_h) > \"$@\"",
    tools = ["//:generate_ImfCompression_h"],
)

cc_binary(
    name = "OpenEXRDemo",
    srcs = [
    	...
    	"ImfCompression.h",
    ],
)

generate_ImfCompression_h.py:

#!/usr/bin/env python3

print("print text that should go into ImfCompression.h")

Another option would be to use "pure" Bazel (Starlark) or to include a header that was pre-genreated with CMake.

@cary-ilm
Copy link
Member

Looks like this uses cmake_path which was added in cmake 3.20, but OpenEXR currently requires only v3.12, hence the build failure.

Could you possibly provide an alternate implementation?

@pleprince
Copy link
Contributor Author

pleprince commented Jan 22, 2024

@Vertexwahn
I read a bit about bazel over the weekend and it seems, by design, incapable of doing the same thing.
I will certainly NOT use cpp to solve this: having had to maintain large macro-based setups, I know how painfully confusing and hard to maintain it can get.
@peterhillman' suggestion to create a run-once setup seems more promising.

Note that I had originally written a python script to do this, but realized it was introducing a new build-time dependency.

Signed-off-by: Philippe Leprince <[email protected]>
All files are generated during configuration, making them available for bazel. See comments at the top of src/lib/OpenEXR/CMakeLists.txt.

Signed-off-by: Philippe Leprince <[email protected]>
@pleprince
Copy link
Contributor Author

I should have addressed all comments now. I will now write some tests for sanity checking.

@pleprince
Copy link
Contributor Author

I am a bit confused... How do you make a test fail ?
If I throw an exception, I get:
3/16 Test #3: OpenEXR.testCompressionApi .......Subprocess aborted***Exception: 0.11 sec
... but all the tests have some sort of exception handling or don't pass a non-0 return code. That basically means that tests ALWAYS PASS if they don't crash.
Thanks for your help.

@cary-ilm
Copy link
Member

Typically tests use assert.

@pleprince
Copy link
Contributor Author

It can not work in release mode: assert() is a macro that evaluates to nothing when NDEBUG is defined.
Are you running tests in debug mode ?

@cary-ilm
Copy link
Member

Test files all do this:

#ifdef NDEBUG
#    undef NDEBUG
#endif

spotted by @peterhillman

Signed-off-by: Philippe Leprince <[email protected]>
suggested by @peterhillman.

Signed-off-by: Philippe Leprince <[email protected]>
Signed-off-by: Philippe Leprince <[email protected]>
@pleprince
Copy link
Contributor Author

There are a number of other tests that could use the compression API function to enumerate appropriate compression methods:

  • src/test/OpenEXRCoreTest/deep.cpp
  • src/test/OpenEXRTest/testCopyDeepScanLine.cpp
  • src/test/OpenEXRTest/testCopyDeepTiled.cpp
  • src/test/OpenEXRTest/testDeepScanLineBasic.cpp
  • src/test/OpenEXRTest/testDeepScanLineHuge.cpp
  • src/test/OpenEXRTest/testDeepTiledBasic.cpp

I also left aside src/wrappers/python/Imath.py, although it enumerates the compression methods and could be generated too.

What do you reckon ?

@pleprince
Copy link
Contributor Author

If the CI passes, I will remove the draft status of this PR.
@Vertexwahn I'll let you improve the bazel build, if you want to. I tried to add a genrule locally to run the cmake step, but no luck with that. If anything bazel makes me appreciate cmake... 😆

@peterhillman
Copy link
Contributor

An alternative to generating Imath.py could be adding a test that makes sure that the python Compression object is consistent with the C++ types. That way, Imath.py stays freely editable, but a new compression type cannot be added without manually adding it to Imath.py too.

The OpenEXRCoreTest is tricky because the Core library itself doesn't have access to the compression API. Again, a test could be generated to confirm the C++ and Core APIs are consistent.

Personally I'd be a little wary of changing existing tests: although it's easy to confirm that test still passes after the change, it's harder to confirm it still correctly detects all the same issues the original test did. Perhaps it would be better to update those when new compression types are added.

@cary-ilm
Copy link
Member

This implementation of Imath.py needs to change significantly, so I wouldn't bother with it now. It's a stub substituting for the proper Imath module, but it also contains stuff it shouldn't. Compression types shouldn't appear in Imath. This is part of the ongoing adoption of the openexr python bindings, still in progress.

@pleprince
Copy link
Contributor Author

Thanks for the feedback @peterhillman and @cary-ilm .
I will leave things as they are and hit "Ready for review".

@pleprince pleprince marked this pull request as ready for review January 24, 2024 14:53
@pleprince
Copy link
Contributor Author

I updated the PR with the latest in main, to make sure all tests and workflows pass. Cheers.

@pleprince
Copy link
Contributor Author

Hi @peterhillman & @cary-ilm
Is this still under consideration for merging ? Is there anything more I should do ?
Cheers

@cary-ilm
Copy link
Member

@pleprince, we discussed this briefly in the last steering committee meeting. We're concerned that is a fair bit of overhead and additional infrastructure with potential future maintenance burden, all to facilitate something that happens very rarely, since we don't expect to add compression types to the format frequently. It's true that we're contemplating a couple instances of that now, your zstd contribution in particular, but our sense is that we'd actually prefer the more straightforward brute force approach. I appreciate all the careful thought you put into the design. @peterhillman, can you make any further recommendations?

@peterhillman
Copy link
Contributor

I think API extensions are definitely worthwhile, providing methods for accessing names for each compression type and whether they are lossy or lossless etc, and we should keep those. The test you added makes it easy to spot a mistake in manually updating files whenever there's a new compression type.
So, maybe remove the automatic generation from the cmake file, and the .in files it reads, but basically leave the other files as you have them?

@pleprince
Copy link
Contributor Author

Fair enough: I will do that. Thanks :-)

Thanks to @cary-ilm and @peterhillman.

Signed-off-by: Philippe Leprince <[email protected]>

Fixed: I missed a couple of lines in the CI setup.

Signed-off-by: Philippe Leprince <[email protected]>

fix whitespace to minimize diff

Signed-off-by: Philippe Leprince <[email protected]>

fix whitespace to minimize diff

Signed-off-by: Philippe Leprince <[email protected]>

fix whitespace to minimize diff

Signed-off-by: Philippe Leprince <[email protected]>
@pleprince pleprince force-pushed the Automate-compression-method-detection branch from d961e74 to bd763ef Compare February 20, 2024 13:01
@pleprince
Copy link
Contributor Author

@cary-ilm , @peterhillman, please review. Cheers

Copy link
Contributor

@peterhillman peterhillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of queries

src/lib/OpenEXR/ImfCompressor.cpp Outdated Show resolved Hide resolved
src/lib/OpenEXR/ImfCompression.h Show resolved Hide resolved
…rom ImfCompressor to ImfCompression.

- simplify logic in isLossyCompression and isValidDeepCompression.
- additional clarifications in comments / doc strings.

Signed-off-by: Philippe Leprince <[email protected]>
@pleprince
Copy link
Contributor Author

pleprince commented Feb 28, 2024

If / when this gets merged, I will update the Zstd PR. Cheers

@cary-ilm
Copy link
Member

This looks good, I'll look it over in more detail in a bit but we'll merge it shortly, for the next minor release. Thanks!

It would be good to include a comment, maybe at the top of ImfCompression.h near the declaration of the enum, describing the various parts that need editing when adding a compression type.

@cary-ilm
Copy link
Member

cary-ilm commented Mar 5, 2024

I'm going to merge this as is, we can follow up later with further changes as needed.

Stating the obvious, but this covers only the C++ API, not OpenEXRCore, which should be kept in sync.

@cary-ilm cary-ilm merged commit 60898ad into AcademySoftwareFoundation:main Mar 5, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants