-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Categorify
inference and testing
#1874
Fix Categorify
inference and testing
#1874
Conversation
Categorify
inference testingCategorify
inference and testing
py::object supports = py::module_::import("nvtabular").attr("graph").attr("base_operator").attr("Supports"); | ||
py::object supports = py::module_::import("nvtabular").attr("graph").attr("operator").attr("Supports"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base_operator.py
file was renamed to operator.py
in NVIDIA-Merlin/core#359. Therefore, this fix should be valid for >=23.08
def supported_formats(self): | ||
return DataFormats.PANDAS_DATAFRAME | DataFormats.CUDF_DATAFRAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With NVIDIA-Merlin/systems#389 applied to systems
, I was seeing inference errors in Groupby
. This is because the Groupby
operator calls sort_values
, and sorting is not supported for merlins TensorTable
abstraction.
Since I am not entirely sure which NVTabular operations are supported with TensorTable
, I figured the safest fix was to assume none of the operations are supported for now. (Any thoughts on this @jperez999 ?)
In a follow-up to this PR, it probably makes sense to add GeneralOperator = BaseOperator
above this. That way, any operator with known support for TensorTable
could inherit from GeneralOperator
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliverholworthy and @jperez999 do you have a chance to look at Rick's comments above? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be alright. All of the operators in nvtabular were created with data frames in mind. If we ever decide to add in tensor table, we can make the change then. If this speeds up the runs I say we should do it and solves breaking issues we should execute.
# Check results are consistent with python code path | ||
expect = workflow.transform(df) | ||
got = pd.DataFrame(output_tensors) | ||
assert_eq(expect, got) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was only.checking the data type of the result. Now it also checks if the result is correct.
Documentation preview |
Fixes
Categorify
inference behavior with NVIDIA-Merlin/systems#389 in place. Also includes a minor improvement in test coverage (checks that the cpp code path is actually "correct").