Fix `Categorify` inference and testing #1874

rjzamora · 2024-04-10T18:09:07Z

Fixes Categorify inference behavior with NVIDIA-Merlin/systems#389 in place. Also includes a minor improvement in test coverage (checks that the cpp code path is actually "correct").

rjzamora · 2024-04-17T17:54:58Z

cpp/nvtabular/inference/categorify.cc

-                                   py::object supports = py::module_::import("nvtabular").attr("graph").attr("base_operator").attr("Supports");
+                                   py::object supports = py::module_::import("nvtabular").attr("graph").attr("operator").attr("Supports");


The base_operator.py file was renamed to operator.py in NVIDIA-Merlin/core#359. Therefore, this fix should be valid for >=23.08

rjzamora · 2024-04-17T18:03:04Z

nvtabular/ops/operator.py

+    def supported_formats(self):
+        return DataFormats.PANDAS_DATAFRAME | DataFormats.CUDF_DATAFRAME


With NVIDIA-Merlin/systems#389 applied to systems, I was seeing inference errors in Groupby. This is because the Groupby operator calls sort_values, and sorting is not supported for merlins TensorTable abstraction.

Since I am not entirely sure which NVTabular operations are supported with TensorTable, I figured the safest fix was to assume none of the operations are supported for now. (Any thoughts on this @jperez999 ?)

In a follow-up to this PR, it probably makes sense to add GeneralOperator = BaseOperator above this. That way, any operator with known support for TensorTable could inherit from GeneralOperator instead.

@oliverholworthy and @jperez999 do you have a chance to look at Rick's comments above? thanks.

This should be alright. All of the operators in nvtabular were created with data frames in mind. If we ever decide to add in tensor table, we can make the change then. If this speeds up the runs I say we should do it and solves breaking issues we should execute.

rjzamora · 2024-04-17T18:04:24Z

tests/unit/ops/test_categorify.py

+    # Check results are consistent with python code path
+    expect = workflow.transform(df)
+    got = pd.DataFrame(output_tensors)
+    assert_eq(expect, got)


This test was only.checking the data type of the result. Now it also checks if the result is correct.

github-actions · 2024-04-29T23:17:53Z

Documentation preview

https://nvidia-merlin.github.io/NVTabular/review/pr-1874

rjzamora added 2 commits April 10, 2024 11:03

improve Categorify inference testing

337e904

fix module name

a92f0b9

rjzamora added bug Something isn't working ci labels Apr 11, 2024

rjzamora added 2 commits April 11, 2024 08:49

avoid TENSOR_TABLE

f091625

fix previous commit

f4c9de3

rjzamora self-assigned this Apr 11, 2024

opt out of tensor-table support by default

700b196

rjzamora changed the title ~~Improve Categorify inference testing~~ Fix Categorify inference and testing Apr 17, 2024

rjzamora commented Apr 17, 2024

View reviewed changes

rjzamora added 2 commits April 17, 2024 13:06

simplify sort_values call

e22a836

avoid using p2p shuffle - sorting used to be unstable

c83d679

rjzamora mentioned this pull request Apr 17, 2024

Avoid divide-by-zero error NVIDIA-Merlin/core#366

Merged

oliverholworthy approved these changes Apr 23, 2024

View reviewed changes

This was referenced Apr 23, 2024

Update Docs Theme to use sphinx-book-theme #1878

Merged

Enable cpp code path for Categorify ops NVIDIA-Merlin/systems#389

Merged

Merge branch 'main' into test-categorify-inference

f568565

rjzamora merged commit 0b58cc4 into NVIDIA-Merlin:main Apr 29, 2024
6 checks passed

rjzamora deleted the test-categorify-inference branch April 29, 2024 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `Categorify` inference and testing #1874

Fix `Categorify` inference and testing #1874

rjzamora commented Apr 10, 2024 •

edited

Loading

rjzamora Apr 17, 2024

rjzamora Apr 17, 2024

rnyak Apr 29, 2024

jperez999 Apr 29, 2024

rjzamora Apr 17, 2024

github-actions bot commented Apr 29, 2024

		py::object supports = py::module_::import("nvtabular").attr("graph").attr("base_operator").attr("Supports");
		py::object supports = py::module_::import("nvtabular").attr("graph").attr("operator").attr("Supports");

		def supported_formats(self):
		return DataFormats.PANDAS_DATAFRAME \| DataFormats.CUDF_DATAFRAME

Fix Categorify inference and testing #1874

Fix Categorify inference and testing #1874

Conversation

rjzamora commented Apr 10, 2024 • edited Loading

rjzamora Apr 17, 2024

Choose a reason for hiding this comment

rjzamora Apr 17, 2024

Choose a reason for hiding this comment

rnyak Apr 29, 2024

Choose a reason for hiding this comment

jperez999 Apr 29, 2024

Choose a reason for hiding this comment

rjzamora Apr 17, 2024

Choose a reason for hiding this comment

github-actions bot commented Apr 29, 2024

Documentation preview

Fix `Categorify` inference and testing #1874

Fix `Categorify` inference and testing #1874

rjzamora commented Apr 10, 2024 •

edited

Loading