Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing TF to < 2.16 #2071

Merged
merged 7 commits into from
Mar 19, 2024
Merged

Conversation

miguelgfierro
Copy link
Collaborator

@miguelgfierro miguelgfierro commented Mar 18, 2024

Description

Related Issues

#2073

References

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • I have signed the commits, e.g. git commit -s -m "your commit message".
  • This PR is being made to staging branch AND NOT TO main branch.

Signed-off-by: miguelgfierro <[email protected]>
@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Mar 18, 2024

Getting error in FastAI:

2024-03-18T20:52:18.2319582Z =================================== FAILURES ===================================
2024-03-18T20:52:18.2320778Z _________________________________ test_fastai __________________________________
2024-03-18T20:52:18.2321604Z 
2024-03-18T20:52:18.2323932Z notebooks = ***'als_deep_dive': '/mnt/azureml/cr/j/cda5fa5f89704ed0a7056494d3d4bfae/exe/wd/examples/02_model_collaborative_filtering...rk_movielens': '/mnt/azureml/cr/j/cda5fa5f89704ed0a7056494d3d4bfae/exe/wd/examples/06_benchmarks/movielens.ipynb', ...***
2024-03-18T20:52:18.2325994Z output_notebook = 'output.ipynb', kernel_name = 'python3'
2024-03-18T20:52:18.2326452Z 
2024-03-18T20:52:18.2326631Z     @pytest.mark.notebooks
2024-03-18T20:52:18.2327071Z     @pytest.mark.gpu
2024-03-18T20:52:18.2327593Z     def test_fastai(notebooks, output_notebook, kernel_name):
2024-03-18T20:52:18.2328247Z         notebook_path = notebooks["fastai"]
2024-03-18T20:52:18.2328768Z >       execute_notebook(
2024-03-18T20:52:18.2329189Z             notebook_path,
2024-03-18T20:52:18.2329613Z             output_notebook,
2024-03-18T20:52:18.2330063Z             kernel_name=kernel_name,
2024-03-18T20:52:18.2330736Z             parameters=dict(TOP_K=10, MOVIELENS_DATA_SIZE="mock100", EPOCHS=1),
2024-03-18T20:52:18.2331392Z         )
2024-03-18T20:52:18.2331587Z 
2024-03-18T20:52:18.2331815Z tests/unit/examples/test_notebooks_gpu.py:22: 
2024-03-18T20:52:18.2332930Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2024-03-18T20:52:18.2333695Z recommenders/utils/notebook_utils.py:102: in execute_notebook
2024-03-18T20:52:18.2334321Z     executed_notebook, _ = execute_preprocessor.preprocess(
2024-03-18T20:52:18.2335407Z /azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/nbconvert/preprocessors/execute.py:102: in preprocess
2024-03-18T20:52:18.2336325Z     self.preprocess_cell(cell, resources, index)
2024-03-18T20:52:18.2337370Z /azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/nbconvert/preprocessors/execute.py:123: in preprocess_cell
2024-03-18T20:52:18.2338343Z     cell = self.execute_cell(cell, index, store_history=True)
2024-03-18T20:52:18.2339342Z /azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/jupyter_core/utils/__init__.py:165: in wrapped
2024-03-18T20:52:18.2340163Z     return loop.run_until_complete(inner)
2024-03-18T20:52:18.2341044Z /azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/asyncio/base_events.py:653: in run_until_complete
2024-03-18T20:52:18.2342013Z     return future.result()
2024-03-18T20:52:18.2343235Z /azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/nbclient/client.py:1062: in async_execute_cell
2024-03-18T20:52:18.2344210Z     await self._check_raise_for_error(cell, cell_index, exec_reply)
2024-03-18T20:52:18.2344850Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2024-03-18T20:52:18.2345239Z 
2024-03-18T20:52:18.2345637Z self = <nbconvert.preprocessors.execute.ExecutePreprocessor object at 0x14a5c65978d0>
2024-03-18T20:52:18.2347304Z cell = ***'cell_type': 'code', 'execution_count': 18, 'metadata': ***'execution': ***'iopub.status.busy': '2024-03-18T20:51:33.1791...  prediction_col=PREDICTION)\n\nprint("Took *** seconds for *** predictions.".format(test_time, len(training_removed)))'***
2024-03-18T20:52:18.2348511Z cell_index = 30
2024-03-18T20:52:18.2349807Z exec_reply = ***'buffers': [], 'content': ***'ename': 'RuntimeError', 'engine_info': ***'engine_id': -1, 'engine_uuid': '217e3d70-6a11-48...e, 'engine': '217e3d70-6a11-4870-9977-5df29757f686', 'started': '2024-03-18T20:51:33.179501Z', 'status': 'error'***, ...***
2024-03-18T20:52:18.2350858Z 
2024-03-18T20:52:18.2351017Z     async def _check_raise_for_error(
2024-03-18T20:52:18.2351627Z         self, cell: NotebookNode, cell_index: int, exec_reply: dict[str, t.Any] | None
2024-03-18T20:52:18.2352794Z     ) -> None:
2024-03-18T20:52:18.2353331Z         if exec_reply is None:
2024-03-18T20:52:18.2353820Z             return None
2024-03-18T20:52:18.2354211Z     
2024-03-18T20:52:18.2354614Z         exec_reply_content = exec_reply["content"]
2024-03-18T20:52:18.2355224Z         if exec_reply_content["status"] != "error":
2024-03-18T20:52:18.2355771Z             return None
2024-03-18T20:52:18.2356154Z     
2024-03-18T20:52:18.2356613Z         cell_allows_errors = (not self.force_raise_errors) and (
2024-03-18T20:52:18.2357244Z             self.allow_errors
2024-03-18T20:52:18.2357832Z             or exec_reply_content.get("ename") in self.allow_error_names
2024-03-18T20:52:18.2358658Z             or "raises-exception" in cell.metadata.get("tags", [])
2024-03-18T20:52:18.2359253Z         )
2024-03-18T20:52:18.2359602Z         await run_hook(
2024-03-18T20:52:18.2360250Z             self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
2024-03-18T20:52:18.2360953Z         )
2024-03-18T20:52:18.2361312Z         if not cell_allows_errors:
2024-03-18T20:52:18.2361984Z >           raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
2024-03-18T20:52:18.2363053Z E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
2024-03-18T20:52:18.2363919Z E           ------------------
2024-03-18T20:52:18.2364333Z E           with Timer() as test_time:
2024-03-18T20:52:18.2364823Z E               top_k_scores = score(learner, 
2024-03-18T20:52:18.2365590Z E                                    test_df=training_removed,
2024-03-18T20:52:18.2366160Z E                                    user_col=USER, 
2024-03-18T20:52:18.2366693Z E                                    item_col=ITEM, 
2024-03-18T20:52:18.2367251Z E                                    prediction_col=PREDICTION)
2024-03-18T20:52:18.2367760Z E           
2024-03-18T20:52:18.2368388Z E           print("Took *** seconds for *** predictions.".format(test_time, len(training_removed)))
2024-03-18T20:52:18.2369140Z E           ------------------
2024-03-18T20:52:18.2369526Z E           
2024-03-18T20:52:18.2369842Z E           
2024-03-18T20:52:18.2370410Z E           �[0;31m---------------------------------------------------------------------------�[0m
2024-03-18T20:52:18.2371273Z E           �[0;31mRuntimeError�[0m                              Traceback (most recent call last)
2024-03-18T20:52:18.2371979Z E           Cell �[0;32mIn[18], line 2�[0m
2024-03-18T20:52:18.2372720Z E           �[1;32m      1�[0m �[38;5;28;01mwith�[39;00m Timer() �[38;5;28;01mas�[39;00m test_time:
2024-03-18T20:52:18.2373914Z E           �[0;32m----> 2�[0m     top_k_scores �[38;5;241m=�[39m �[43mscore�[49m�[43m(�[49m�[43mlearner�[49m�[43m,�[49m�[43m �[49m
2024-03-18T20:52:18.2374945Z E           �[1;32m      3�[0m �[43m                         �[49m�[43mtest_df�[49m�[38;5;241;43m=�[39;49m�[43mtraining_removed�[49m�[43m,�[49m
2024-03-18T20:52:18.2375914Z E           �[1;32m      4�[0m �[43m                         �[49m�[43muser_col�[49m�[38;5;241;43m=�[39;49m�[43mUSER�[49m�[43m,�[49m�[43m �[49m
2024-03-18T20:52:18.2376844Z E           �[1;32m      5�[0m �[43m                         �[49m�[43mitem_col�[49m�[38;5;241;43m=�[39;49m�[43mITEM�[49m�[43m,�[49m�[43m �[49m
2024-03-18T20:52:18.2377920Z E           �[1;32m      6�[0m �[43m                         �[49m�[43mprediction_col�[49m�[38;5;241;43m=�[39;49m�[43mPREDICTION�[49m�[43m)�[49m
2024-03-18T20:52:18.2379553Z E           �[1;32m      8�[0m �[38;5;28mprint�[39m(�[38;5;124m"�[39m�[38;5;124mTook �[39m�[38;5;132;01m***�[39;00m�[38;5;124m seconds for �[39m�[38;5;132;01m***�[39;00m�[38;5;124m predictions.�[39m�[38;5;124m"�[39m�[38;5;241m.�[39mformat(test_time, �[38;5;28mlen�[39m(training_removed)))
2024-03-18T20:52:18.2380709Z E           
2024-03-18T20:52:18.2382034Z E           File �[0;32m/mnt/azureml/cr/j/cda5fa5f89704ed0a7056494d3d4bfae/exe/wd/recommenders/models/fastai/fastai_utils.py:67�[0m, in �[0;36mscore�[0;34m(learner, test_df, user_col, item_col, prediction_col, top_k)�[0m
2024-03-18T20:52:18.2383787Z E           �[1;32m     65�[0m �[38;5;28;01mif�[39;00m torch�[38;5;241m.�[39mcuda�[38;5;241m.�[39mis_available():
2024-03-18T20:52:18.2384744Z E           �[1;32m     66�[0m     x �[38;5;241m=�[39m x�[38;5;241m.�[39mto(�[38;5;124m"�[39m�[38;5;124mcuda�[39m�[38;5;124m"�[39m)
2024-03-18T20:52:18.2386215Z E           �[0;32m---> 67�[0m pred �[38;5;241m=�[39m �[43mlearner�[49m�[38;5;241;43m.�[39;49m�[43mmodel�[49m�[38;5;241;43m.�[39;49m�[43mforward�[49m�[43m(�[49m�[43mx�[49m�[43m)�[49m�[38;5;241m.�[39mdetach()�[38;5;241m.�[39mcpu()�[38;5;241m.�[39mnumpy()
2024-03-18T20:52:18.2387481Z E           �[1;32m     68�[0m scores �[38;5;241m=�[39m pd�[38;5;241m.�[39mDataFrame(
2024-03-18T20:52:18.2388360Z E           �[1;32m     69�[0m     ***user_col: test_df[user_col], item_col: test_df[item_col], prediction_col: pred***
2024-03-18T20:52:18.2389044Z E           �[1;32m     70�[0m )
2024-03-18T20:52:18.2390087Z E           �[1;32m     71�[0m scores �[38;5;241m=�[39m scores�[38;5;241m.�[39msort_values([user_col, prediction_col], ascending�[38;5;241m=�[39m[�[38;5;28;01mTrue�[39;00m, �[38;5;28;01mFalse�[39;00m])
2024-03-18T20:52:18.2391015Z E           
2024-03-18T20:52:18.2392094Z E           File �[0;32m/azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/fastai/collab.py:48�[0m, in �[0;36mEmbeddingDotBias.forward�[0;34m(self, x)�[0m
2024-03-18T20:52:18.2393377Z E           �[1;32m     46�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mforward�[39m(�[38;5;28mself�[39m, x):
2024-03-18T20:52:18.2394415Z E           �[1;32m     47�[0m     users,items �[38;5;241m=�[39m x[:,�[38;5;241m0�[39m],x[:,�[38;5;241m1�[39m]
2024-03-18T20:52:18.2395890Z E           �[0;32m---> 48�[0m     dot �[38;5;241m=�[39m �[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43mu_weight�[49m�[43m(�[49m�[43musers�[49m�[43m)�[49m�[38;5;241m*�[39m �[38;5;28mself�[39m�[38;5;241m.�[39mi_weight(items)
2024-03-18T20:52:18.2397789Z E           �[1;32m     49�[0m     res �[38;5;241m=�[39m dot�[38;5;241m.�[39msum(�[38;5;241m1�[39m) �[38;5;241m+�[39m �[38;5;28mself�[39m�[38;5;241m.�[39mu_bias(users)�[38;5;241m.�[39msqueeze() �[38;5;241m+�[39m �[38;5;28mself�[39m�[38;5;241m.�[39mi_bias(items)�[38;5;241m.�[39msqueeze()
2024-03-18T20:52:18.2399480Z E           �[1;32m     50�[0m     �[38;5;28;01mif�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39my_range �[38;5;129;01mis�[39;00m �[38;5;28;01mNone�[39;00m: �[38;5;28;01mreturn�[39;00m res
2024-03-18T20:52:18.2400301Z E           
2024-03-18T20:52:18.2401492Z E           File �[0;32m/azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/torch/nn/modules/module.py:1511�[0m, in �[0;36mModule._wrapped_call_impl�[0;34m(self, *args, **kwargs)�[0m
2024-03-18T20:52:18.2403511Z E           �[1;32m   1509�[0m     �[38;5;28;01mreturn�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_compiled_call_impl(�[38;5;241m*�[39margs, �[38;5;241m*�[39m�[38;5;241m*�[39mkwargs)  �[38;5;66;03m# type: ignore[misc]�[39;00m
2024-03-18T20:52:18.2404612Z E           �[1;32m   1510�[0m �[38;5;28;01melse�[39;00m:
2024-03-18T20:52:18.2405952Z E           �[0;32m-> 1511�[0m     �[38;5;28;01mreturn�[39;00m �[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43m_call_impl�[49m�[43m(�[49m�[38;5;241;43m*�[39;49m�[43margs�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m*�[39;49m�[38;5;241;43m*�[39;49m�[43mkwargs�[49m�[43m)�[49m
2024-03-18T20:52:18.2407031Z E           
2024-03-18T20:52:18.2408183Z E           File �[0;32m/azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/torch/nn/modules/module.py:1520�[0m, in �[0;36mModule._call_impl�[0;34m(self, *args, **kwargs)�[0m
2024-03-18T20:52:18.2409661Z E           �[1;32m   1515�[0m �[38;5;66;03m# If we don't have any hooks, we want to skip the rest of the logic in�[39;00m
2024-03-18T20:52:18.2410598Z E           �[1;32m   1516�[0m �[38;5;66;03m# this function, and just call forward.�[39;00m
2024-03-18T20:52:18.2412450Z E           �[1;32m   1517�[0m �[38;5;28;01mif�[39;00m �[38;5;129;01mnot�[39;00m (�[38;5;28mself�[39m�[38;5;241m.�[39m_backward_hooks �[38;5;129;01mor�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_backward_pre_hooks �[38;5;129;01mor�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_forward_hooks �[38;5;129;01mor�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_forward_pre_hooks
2024-03-18T20:52:18.2414287Z E           �[1;32m   1518�[0m         �[38;5;129;01mor�[39;00m _global_backward_pre_hooks �[38;5;129;01mor�[39;00m _global_backward_hooks
2024-03-18T20:52:18.2415370Z E           �[1;32m   1519�[0m         �[38;5;129;01mor�[39;00m _global_forward_hooks �[38;5;129;01mor�[39;00m _global_forward_pre_hooks):
2024-03-18T20:52:18.2416779Z E           �[0;32m-> 1520�[0m     �[38;5;28;01mreturn�[39;00m �[43mforward_call�[49m�[43m(�[49m�[38;5;241;43m*�[39;49m�[43margs�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m*�[39;49m�[38;5;241;43m*�[39;49m�[43mkwargs�[49m�[43m)�[49m
2024-03-18T20:52:18.2417830Z E           �[1;32m   1522�[0m �[38;5;28;01mtry�[39;00m:
2024-03-18T20:52:18.2418487Z E           �[1;32m   1523�[0m     result �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
2024-03-18T20:52:18.2419012Z E           
2024-03-18T20:52:18.2420115Z E           File �[0;32m/azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/torch/nn/modules/sparse.py:163�[0m, in �[0;36mEmbedding.forward�[0;34m(self, input)�[0m
2024-03-18T20:52:18.2421694Z E           �[1;32m    162�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mforward�[39m(�[38;5;28mself�[39m, �[38;5;28minput�[39m: Tensor) �[38;5;241m-�[39m�[38;5;241m>�[39m Tensor:
2024-03-18T20:52:18.2423378Z E           �[0;32m--> 163�[0m     �[38;5;28;01mreturn�[39;00m �[43mF�[49m�[38;5;241;43m.�[39;49m�[43membedding�[49m�[43m(�[49m
2024-03-18T20:52:18.2425225Z E           �[1;32m    164�[0m �[43m        �[49m�[38;5;28;43minput�[39;49m�[43m,�[49m�[43m �[49m�[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43mweight�[49m�[43m,�[49m�[43m �[49m�[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43mpadding_idx�[49m�[43m,�[49m�[43m �[49m�[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43mmax_norm�[49m�[43m,�[49m
2024-03-18T20:52:18.2427535Z E           �[1;32m    165�[0m �[43m        �[49m�[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43mnorm_type�[49m�[43m,�[49m�[43m �[49m�[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43mscale_grad_by_freq�[49m�[43m,�[49m�[43m �[49m�[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43msparse�[49m�[43m)�[49m
2024-03-18T20:52:18.2428715Z E           
2024-03-18T20:52:18.2430061Z E           File �[0;32m/azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/torch/nn/functional.py:2237�[0m, in �[0;36membedding�[0;34m(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)�[0m
2024-03-18T20:52:18.2431745Z E           �[1;32m   2231�[0m     �[38;5;66;03m# Note [embedding_renorm set_grad_enabled]�[39;00m
2024-03-18T20:52:18.2432505Z E           �[1;32m   2232�[0m     �[38;5;66;03m# XXX: equivalent to�[39;00m
2024-03-18T20:52:18.2433195Z E           �[1;32m   2233�[0m     �[38;5;66;03m# with torch.no_grad():�[39;00m
2024-03-18T20:52:18.2433897Z E           �[1;32m   2234�[0m     �[38;5;66;03m#   torch.embedding_renorm_�[39;00m
2024-03-18T20:52:18.2434697Z E           �[1;32m   2235�[0m     �[38;5;66;03m# remove once script supports set_grad_enabled�[39;00m
2024-03-18T20:52:18.2435611Z E           �[1;32m   2236�[0m     _no_grad_embedding_renorm_(weight, �[38;5;28minput�[39m, max_norm, norm_type)
2024-03-18T20:52:18.2437402Z E           �[0;32m-> 2237�[0m �[38;5;28;01mreturn�[39;00m �[43mtorch�[49m�[38;5;241;43m.�[39;49m�[43membedding�[49m�[43m(�[49m�[43mweight�[49m�[43m,�[49m�[43m �[49m�[38;5;28;43minput�[39;49m�[43m,�[49m�[43m �[49m�[43mpadding_idx�[49m�[43m,�[49m�[43m �[49m�[43mscale_grad_by_freq�[49m�[43m,�[49m�[43m �[49m�[43msparse�[49m�[43m)�[49m
2024-03-18T20:52:18.2438700Z E           
2024-03-18T20:52:18.2439831Z E           �[0;31mRuntimeError�[0m: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
2024-03-18T20:52:18.2440786Z 
2024-03-18T20:52:18.2441424Z /azureml-envs/azureml_a0d14432a61fd07846aaa46d0fe66974/lib/python3.11/site-packages/nbclient/client.py:918: CellExecutionError

Trying to add the model also to CUDA. Tests: https://github.com/recommenders-team/recommenders/actions/runs/8333772519

@miguelgfierro
Copy link
Collaborator Author

@SimonYansenZhao This PR fixes the issue with TF and FastAI. See unit tests: https://github.com/recommenders-team/recommenders/actions/runs/8333772519

I think we should fix to TF<1.16 instead of 1.15.0, because in 1.15.1 the code works, and maybe they do 15.2 or something

Please take a look

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
@miguelgfierro miguelgfierro merged commit 14c5c93 into simonz-dep-upgrade-20230606 Mar 19, 2024
1 check passed
@miguelgfierro miguelgfierro deleted the miguel/fix_tf branch March 19, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants