-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor custom ops classes to use python_op_factory as base class #5338
Conversation
!build |
CI MESSAGE: [13125785]: BUILD STARTED |
bb65224
to
3b93984
Compare
!build |
CI MESSAGE: [13127551]: BUILD STARTED |
CI MESSAGE: [13125785]: BUILD FAILED |
CI MESSAGE: [13127551]: BUILD FAILED |
!build |
CI MESSAGE: [13145398]: BUILD STARTED |
!build |
CI MESSAGE: [13151269]: BUILD STARTED |
CI MESSAGE: [13151269]: BUILD PASSED |
|
||
def __init__(self, function, num_outputs=1, **kwargs): | ||
|
||
# The layout need to be handled manually due to implementation detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The layout need to be handled manually due to implementation detail | |
# The layouts need to be handled manually due to implementation details |
or
# The layout need to be handled manually due to implementation detail | |
# The layouts need to be handled manually due to an implementation detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
If it is not provided, the class is assumed to be `_generated=True`, otherwise, we mark | ||
it as False - the user will provide a custom wrapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a (formerly) well-defined property a side effect of some other condition is making the code hard to read and understand. I'd rather have an explicit argument or override this attribute manually for those few operators that need it to be False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Operator._internal_schema_name = internal_schema_name | ||
# The class was generated using python_op_factory, and we don't expect custom wrapper. | ||
# If needed, allow this tag to be overridden by an argument to this function | ||
Operator._generated = internal_schema_name is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What I've written at the declaration.
- Now that this attribute is always there (?) we should probably get rid of those
getattr(op, "_generated", None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still leaves the External Source without the use of python_op_factory, maybe I can look at it as a follow-up, but the ExternalSourceGroup captures the OperatorInstances directly, so this whole abstraction is not usable there at this point in time.
!build |
CI MESSAGE: [13264883]: BUILD STARTED |
As _TFRecordReaderImpl is used twice with different schema parametrization, it is now generated in a function, inheriting from the same base class as all other ops. This reduces the custom code to minimum, allowing the base class to handle all properties and some common arguments. The argument handling is moved to base class by updating the kwargs before invoking the base class. When a tfrec.Feature is encountered in the Python argument serialization layer, it is again processed by the tfrec.Feature constructor (as we do for other types, normalizing numbers with int() or float()). For this purpose a copy constructor is added and exposed in Pybind. An alternative would be to change the conversion to an identity function - that would keep the error in the spec.AddArg rather than moving it to constructor parameter matching. Validation code is added to the operator, to check for matching type with better error message. Signed-off-by: Krzysztof Lecki <[email protected]>
Refactor PythonFunctionBase class into a base class generator. Adjust TFRecord to use internal schema correctly. Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
cf67060
to
bad8044
Compare
Signed-off-by: Krzysztof Lecki <[email protected]>
!build |
CI MESSAGE: [13268279]: BUILD STARTED |
CI MESSAGE: [13264883]: BUILD PASSED |
CI MESSAGE: [13268279]: BUILD PASSED |
Category: Refactoring Breaking change
Description:
python_op_factory
was extended and documented. There is a new, optionalinternal_schema_name
parameter, that will be used to retrieve schema and spec for argument handling on Python side, while allowing the originalschema
to be used for the purpose of exposing the documentation.As both
TFRecordReader
andPythonFunction
has several variants with different base implementation schemas, the base classes are converted into a class generator function. The new classes are marked as not generated, as the type hints are done manually for them.This reduces the custom code to minimum, allowing the base class to handle all properties and some common arguments.
The argument handling is moved to base class by updating the kwargs before invoking the base class.
When a
tfrec.Feature
is encountered in the Python argument serialization layer, it is again processed by thetfrec.Feature
constructor (as we do for other types, normalizing numbers with int() or float()). For this purpose a copy constructor is added and exposed in Pybind. An alternative would be to change the conversion to an identity function - that would keep the error in the spec.AddArg rather than moving it to constructor parameter matching.Validation code is added to the operator, to check for matching type with better error message.
NumbaFunction was also reworked, inheriting directly from the python_op_factory.
Calls to
_raise_no_current_pipeline
were eliminated - this function doesn't exist!Breaking change: There is no longer a
PythonFunctionBase
class innvidia.dali.ops
.Additional information:
Affected modules and functionalities:
TFRecord, Python Function and Numba Function operators in Python
Key points relevant for the review:
Does it has a potential to break something?
Does the documentation render correctly?
What would be a cleaner way to have schema and internal schema? - otherwise we would need to rewrite the operators to not use two schemas - one for presentation and one for implementation.
Should we keep implementation detail of
PythonFunctionBase
alive?As a followup a code that prohibits MIS would be a nice generalization for those custom implementations.
Tests:
All TFRecord, Python function and Numba Function tests must pass
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A