Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor custom ops classes to use python_op_factory as base class #5338

Merged
merged 8 commits into from
Mar 5, 2024

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Feb 27, 2024

Category: Refactoring Breaking change

Description:

python_op_factory was extended and documented. There is a new, optional internal_schema_name parameter, that will be used to retrieve schema and spec for argument handling on Python side, while allowing the original schema to be used for the purpose of exposing the documentation.

As both TFRecordReader and PythonFunction has several variants with different base implementation schemas, the base classes are converted into a class generator function. The new classes are marked as not generated, as the type hints are done manually for them.

This reduces the custom code to minimum, allowing the base class to handle all properties and some common arguments.
The argument handling is moved to base class by updating the kwargs before invoking the base class.

When a tfrec.Feature is encountered in the Python argument serialization layer, it is again processed by the tfrec.Feature constructor (as we do for other types, normalizing numbers with int() or float()). For this purpose a copy constructor is added and exposed in Pybind. An alternative would be to change the conversion to an identity function - that would keep the error in the spec.AddArg rather than moving it to constructor parameter matching.
Validation code is added to the operator, to check for matching type with better error message.

NumbaFunction was also reworked, inheriting directly from the python_op_factory.

Calls to _raise_no_current_pipeline were eliminated - this function doesn't exist!

Breaking change: There is no longer a PythonFunctionBase class in nvidia.dali.ops.

Additional information:

Affected modules and functionalities:

TFRecord, Python Function and Numba Function operators in Python

Key points relevant for the review:

Does it has a potential to break something?
Does the documentation render correctly?
What would be a cleaner way to have schema and internal schema? - otherwise we would need to rewrite the operators to not use two schemas - one for presentation and one for implementation.
Should we keep implementation detail of PythonFunctionBase alive?

As a followup a code that prohibits MIS would be a nice generalization for those custom implementations.

Tests:

  • Existing tests apply
    All TFRecord, Python function and Numba Function tests must pass
  • New tests added
    • Python tests - new type error checking for TFRecord is tested.
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@klecki
Copy link
Contributor Author

klecki commented Feb 27, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13125785]: BUILD STARTED

@klecki klecki marked this pull request as draft February 27, 2024 20:43
@klecki klecki changed the title Refactor TFRecordReader to use python_op_factory as base class Refactor custom ops classes to use python_op_factory as base class Feb 27, 2024
@klecki klecki marked this pull request as ready for review February 27, 2024 21:35
@klecki
Copy link
Contributor Author

klecki commented Feb 27, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13127551]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13125785]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13127551]: BUILD FAILED

@klecki
Copy link
Contributor Author

klecki commented Feb 28, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13145398]: BUILD STARTED

@klecki
Copy link
Contributor Author

klecki commented Feb 28, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13151269]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13151269]: BUILD PASSED

@NVIDIA NVIDIA deleted a comment from dali-automaton Feb 29, 2024
@NVIDIA NVIDIA deleted a comment from JanuszL Feb 29, 2024
@NVIDIA NVIDIA deleted a comment from dali-automaton Feb 29, 2024

def __init__(self, function, num_outputs=1, **kwargs):

# The layout need to be handled manually due to implementation detail
Copy link
Contributor

@mzient mzient Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The layout need to be handled manually due to implementation detail
# The layouts need to be handled manually due to implementation details

or

Suggested change
# The layout need to be handled manually due to implementation detail
# The layouts need to be handled manually due to an implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 509 to 510
If it is not provided, the class is assumed to be `_generated=True`, otherwise, we mark
it as False - the user will provide a custom wrapper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a (formerly) well-defined property a side effect of some other condition is making the code hard to read and understand. I'd rather have an explicit argument or override this attribute manually for those few operators that need it to be False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Operator._internal_schema_name = internal_schema_name
# The class was generated using python_op_factory, and we don't expect custom wrapper.
# If needed, allow this tag to be overridden by an argument to this function
Operator._generated = internal_schema_name is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What I've written at the declaration.
  2. Now that this attribute is always there (?) we should probably get rid of those getattr(op, "_generated", None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves the External Source without the use of python_op_factory, maybe I can look at it as a follow-up, but the ExternalSourceGroup captures the OperatorInstances directly, so this whole abstraction is not usable there at this point in time.

@klecki
Copy link
Contributor Author

klecki commented Mar 4, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13264883]: BUILD STARTED

As _TFRecordReaderImpl is used twice with different schema
parametrization, it is now generated in a function, inheriting from the
same base class as all other ops.

This reduces the custom code to minimum, allowing the base class to
handle all properties and some common arguments.

The argument handling is moved to base class by updating the kwargs
before invoking the base class.
When a tfrec.Feature is encountered in the Python argument serialization
layer, it is again processed by the tfrec.Feature constructor (as we do
for other types, normalizing numbers with int() or float()).
For this purpose a copy constructor is added and exposed in Pybind.
An alternative would be to change the conversion to an identity
function - that would keep the error in the spec.AddArg rather than
moving it to constructor parameter matching.

Validation code is added to the operator, to check for matching type
with better error message.

Signed-off-by: Krzysztof Lecki <[email protected]>
Refactor PythonFunctionBase class into a base class generator.
Adjust TFRecord to use internal schema correctly.

Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
@klecki
Copy link
Contributor Author

klecki commented Mar 4, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13268279]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13264883]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13268279]: BUILD PASSED

@klecki klecki merged commit ea16aff into NVIDIA:main Mar 5, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants