Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import fails on cpu #109

Closed
yyu22 opened this issue Jun 12, 2024 · 0 comments · Fixed by #123
Closed

Import fails on cpu #109

yyu22 opened this issue Jun 12, 2024 · 0 comments · Fixed by #123
Labels
bug Something isn't working

Comments

@yyu22
Copy link
Contributor

yyu22 commented Jun 12, 2024

Describe the bug
The GPU version of curator fails during import when running on cpu only nodes.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 17, in <module>
    from cupy import _core  # NOQA
  File "/usr/local/lib/python3.10/dist-packages/cupy/_core/__init__.py", line 3, in <module>
    from cupy._core import core  # NOQA
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/NeMo-Curator/nemo_curator/__init__.py", line 29, in <module>
    from .modules import *
  File "/opt/NeMo-Curator/nemo_curator/modules/__init__.py", line 24, in <module>
    from .add_id import AddId
  File "/opt/NeMo-Curator/nemo_curator/modules/add_id.py", line 21, in <module>
    from nemo_curator.datasets import DocumentDataset
  File "/opt/NeMo-Curator/nemo_curator/datasets/__init__.py", line 15, in <module>
    from .doc_dataset import DocumentDataset
  File "/opt/NeMo-Curator/nemo_curator/datasets/doc_dataset.py", line 19, in <module>
    from nemo_curator.utils.distributed_utils import read_data, write_to_disk
  File "/opt/NeMo-Curator/nemo_curator/utils/distributed_utils.py", line 32, in <module>
    cudf = gpu_only_import("cudf")
  File "/opt/NeMo-Curator/nemo_curator/utils/import_utils.py", line 347, in gpu_only_import
    return safe_import(
  File "/opt/NeMo-Curator/nemo_curator/utils/import_utils.py", line 261, in safe_import
    return importlib.import_module(module)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 12, in <module>
    import cupy
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 19, in <module>
    raise ImportError(f'''
ImportError: 
================================================================
Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
  https://docs.cupy.dev/en/latest/install.html

Original error:
  ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
================================================================

Steps/Code to reproduce bug

  1. Install GPU version of curator or use nemo framework container

  2. Run import nemo_curator on cpu-only node/machine

Expected behavior

The GPU version should still work on cpu-only node for steps that does not require GPU (e.g., add id).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant