Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: cannot import name '_TORCH_GREATER_EQUAL_2_0' from 'lightning_fabric.utilities.imports' (/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/imports.py) #10478

Closed
Desperadoze opened this issue Sep 13, 2024 · 2 comments
Assignees
Labels
bug Something isn't working stale

Comments

@Desperadoze
Copy link

Describe the bug

docker pull the nvcr.io/nvidia/nemo:24.07 ,try to use /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py pretrain,but have

ImportError: cannot import name '_TORCH_GREATER_EQUAL_2_0' from 'lightning_fabric.utilities.imports' (/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/imports.py)

the same traing.sh use nemo24.03 images don't have this error.

Steps/Code to reproduce bug

srun --output /xxxxxxxx.out --error xxxxxxxx.err --container-image dockerd://nvcr.io/nvidia/nemo:24.07 --container-mounts /gpfsprd/:/gpfsprd/,/apps/idx_files/:/apps/idx_files/ --no-container-mount-home --mpi=pmi2 --container-writable bash -c "

CUDA_DEVICE_MAX_CONNECTIONS=1 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -u /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py
--config-path=xxxxxxxx
--config-name=******.yaml

Expected behavior

Traceback (most recent call last):
File "/opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py", line 23, in
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
File "/opt/NeMo/nemo/collections/nlp/init.py", line 15, in
from nemo.collections.nlp import data, losses, models, modules
File "/opt/NeMo/nemo/collections/nlp/data/init.py", line 15, in
from nemo.collections.nlp.data.data_utils import *
File "/opt/NeMo/nemo/collections/nlp/data/data_utils/init.py", line 15, in
from nemo.collections.nlp.data.data_utils.data_preprocessing import *
File "/opt/NeMo/nemo/collections/nlp/data/data_utils/data_preprocessing.py", line 28, in
from nemo.utils import logging
File "/opt/NeMo/nemo/utils/init.py", line 32, in
from nemo.utils.lightning_logger_patch import add_memory_handlers_to_pl_logger
File "/opt/NeMo/nemo/utils/lightning_logger_patch.py", line 18, in
import pytorch_lightning as pl
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/init.py", line 27, in
from pytorch_lightning.callbacks import Callback # noqa: E402
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/init.py", line 29, in
from pytorch_lightning.callbacks.pruning import ModelPruning
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/pruning.py", line 32, in
from pytorch_lightning.core.module import LightningModule
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/init.py", line 16, in
from pytorch_lightning.core.module import LightningModule
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/module.py", line 63, in
from pytorch_lightning.trainer import call
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/init.py", line 17, in
from pytorch_lightning.trainer.trainer import Trainer
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 38, in
from lightning_fabric.utilities.imports import _TORCH_GREATER_EQUAL_2_0
ImportError: cannot import name '_TORCH_GREATER_EQUAL_2_0' from 'lightning_fabric.utilities.imports' (/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/imports.py)

Environment overview (please complete the following information)

Docker version 24.0.5, build ced0996
docker pull nvcr.io/nvidia/nemo:24.07

Additional context

Add any other context about the problem here.
Example: H100 & A800 CUDA12.2\12.4\12.6 same error

@Desperadoze Desperadoze added the bug Something isn't working label Sep 13, 2024
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Oct 26, 2024
Copy link
Contributor

github-actions bot commented Nov 2, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants