Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pytorch-lightning with ray[tune] doesn't work on toy MNIST example #10407

Closed
maxmax1992 opened this issue Nov 8, 2021 · 3 comments
Closed
Assignees
Labels
bug Something isn't working help wanted Open to be worked on priority: 1 Medium priority task

Comments

@maxmax1992
Copy link

🐛 Bug

The pytorch-lightning 1.5 breaks when using it with the ray[tune] hyperparameter library, when downgrading to pl 1.4.9 it works just fine!

the bug stack-trace:

(pid=8695)   File "/home/maxim/anaconda3/lib/python3.8/signal.py", line 47, in signal                                                                                                                                                                          
(pid=8695)     handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))                                                                                                                                         
(pid=8695) ValueError: signal only works in main thread                                                                                                                                                                                                        
  0%|          | 0/9912422 [00:00<?, ?it/s]       

To Reproduce

run the given example with pytorch lightning version 1.5 from the official page of ray:
https://docs.ray.io/en/latest/tune/examples/mnist_ptl_mini.html

Expected behavior

I would expect it not to break with the latest pytorch-lightning version

Environment

output of conda list (freshly installed env for python 3.8.8)

_libgcc_mutex             0.1                        main
_openmp_mutex             4.5                       1_gnu
blas                      1.0                         mkl
bzip2                     1.0.8                h7b6447c_0
ca-certificates           2021.10.26           h06a4308_2
certifi                   2021.10.8        py38h06a4308_0
cudatoolkit               11.3.1               h2bc3f7f_2
ffmpeg                    4.3                  hf484d3e_0    pytorch
freetype                  2.11.0               h70c0345_0
giflib                    5.2.1                h7b6447c_0
gmp                       6.2.1                h2531618_2
gnutls                    3.6.15               he1e5248_0
intel-openmp              2021.4.0          h06a4308_3561
jpeg                      9d                   h7f8727e_0
lame                      3.100                h7b6447c_0
lcms2                     2.12                 h3be6417_0
ld_impl_linux-64          2.35.1               h7274673_9
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5101ec6_17
libgomp                   9.3.0               h5101ec6_17
libiconv                  1.15                 h63c8f33_5
libidn2                   2.3.2                h7f8727e_0
libpng                    1.6.37               hbc83047_0
libstdcxx-ng              9.3.0               hd4cf53a_17
libtasn1                  4.16.0               h27cfd23_0
libtiff                   4.2.0                h85742a9_0
libunistring              0.9.10               h27cfd23_0
libuv                     1.40.0               h7b6447c_0
libwebp                   1.2.0                h89dd481_0
libwebp-base              1.2.0                h27cfd23_0
lz4-c                     1.9.3                h295c915_1
mkl                       2021.4.0           h06a4308_640
mkl-service               2.4.0            py38h7f8727e_0
mkl_fft                   1.3.1            py38hd3c417c_0
mkl_random                1.2.2            py38h51133e4_0
ncurses                   6.3                  heee7806_1
nettle                    3.7.3                hbbd107a_1
numpy                     1.21.2           py38h20f2e39_0
numpy-base                1.21.2           py38h79a1101_0
olefile                   0.46               pyhd3eb1b0_0
openh264                  2.1.0                hd408876_0
openssl                   1.1.1l               h7f8727e_0
pillow                    8.4.0            py38h5aabda8_0
pip                       21.2.4           py38h06a4308_0
python                    3.8.8                hdb3f193_5
pytorch                   1.10.0          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
readline                  8.1                  h27cfd23_0
setuptools                58.0.4           py38h06a4308_0
six                       1.16.0             pyhd3eb1b0_0
sqlite                    3.36.0               hc218d9a_0
tk                        8.6.11               h1ccaba5_0
torchaudio                0.10.0               py38_cu113    pytorch
torchvision               0.11.1               py38_cu113    pytorch
typing_extensions         3.10.0.2           pyh06a4308_0
wheel                     0.37.0             pyhd3eb1b0_1
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.9                haebb681_0

Additional context

The error might be in the data loading or something. Any help?

@maxmax1992 maxmax1992 added bug Something isn't working help wanted Open to be worked on labels Nov 8, 2021
@tchaton tchaton added the priority: 1 Medium priority task label Nov 8, 2021
@kaushikb11 kaushikb11 self-assigned this Nov 8, 2021
@immanuelweber
Copy link

Hi, I just got the same issue after moving from pl 1.4.9 to 1.5.0, happens with ray 1.6.0 and ray 1.8.0

@kaushikb11
Copy link
Contributor

kaushikb11 commented Nov 15, 2021

The issue is related with the addition of SignalConnector in #9566. Looking into it

@kaushikb11
Copy link
Contributor

This issue has been resolved with #10610 and the examples need to be updated.

amogkam added a commit to ray-project/ray that referenced this issue Feb 12, 2022
To helps resolve the issues users are facing with running Lightning examples with Ray Tune Lightning-AI/pytorch-lightning#10407

Co-authored-by: Amog Kamsetty <[email protected]>
simonsays1980 pushed a commit to simonsays1980/ray that referenced this issue Feb 27, 2022
To helps resolve the issues users are facing with running Lightning examples with Ray Tune Lightning-AI/pytorch-lightning#10407

Co-authored-by: Amog Kamsetty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on priority: 1 Medium priority task
Projects
None yet
Development

No branches or pull requests

4 participants