You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
once I run the training with the following command: CUDA_VISIBLE_DEVICES=0,1 randport=$(shuf -i8000-9999 -n1) # Generate a random port number python -u main.py \ --dist-url "tcp://127.0.0.1:${randport}" --dist-backend 'nccl' \ --multiprocessing-distributed --world-size 1 --rank 0 \ --dataset=cc3m --val-dataset=cc3m \ --exp-name='gill_exp' --image-dir='data/' --log-base-dir='runs/' \ --opt-version='/opt-6.7b' \ --visual-model /checkpoints/clip-vit-large-patch14 \ --precision='bf16' --print-freq=100
the server will be reconnected and the process will disconnect with the following error message:
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL
UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The text was updated successfully, but these errors were encountered:
once I run the training with the following command:
CUDA_VISIBLE_DEVICES=0,1 randport=$(shuf -i8000-9999 -n1) # Generate a random port number python -u main.py \ --dist-url "tcp://127.0.0.1:${randport}" --dist-backend 'nccl' \ --multiprocessing-distributed --world-size 1 --rank 0 \ --dataset=cc3m --val-dataset=cc3m \ --exp-name='gill_exp' --image-dir='data/' --log-base-dir='runs/' \ --opt-version='/opt-6.7b' \ --visual-model /checkpoints/clip-vit-large-patch14 \ --precision='bf16' --print-freq=100
the server will be reconnected and the process will disconnect with the following error message:
warnings.warn('resource_tracker: There appear to be %d '
The text was updated successfully, but these errors were encountered: