Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change base image to nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 #12

Merged
merged 3 commits into from
Nov 5, 2023

Conversation

aoirint
Copy link
Owner

@aoirint aoirint commented Nov 3, 2023

@aoirint aoirint changed the title Switch base image to nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 Change base image to nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 Nov 3, 2023
@aoirint
Copy link
Owner Author

aoirint commented Nov 3, 2023

Not working yet due to an error from bitsandbytes library. Maybe related to CUDA 11.8 nvrtc.so

(This is just an experimental training log. Ignore missing caption file caution).

$ sudo docker build -t aoirint/sd_scripts .
$ sudo docker run --rm --gpus all \
  -v "./base_model:/base_model" \
  -v "./work:/work" \
  -v "./cache/huggingface/hub:/home/user/.cache/huggingface/hub" \
  aoirint/sd_scripts \
  train_network.py \
  --config_file /work/train_config/train_20231103.1/config.toml
2023-11-03 09:57:13.728649: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-03 09:57:13.856718: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-03 09:57:14.544165: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-11-03 09:57:14.544257: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-11-03 09:57:14.544266: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-11-03 09:57:17.093996: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-03 09:57:17.218575: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-03 09:57:17.889476: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-11-03 09:57:17.889556: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-11-03 09:57:17.889567: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loading settings from /work/train_config/train_20231103.1/config.toml...
/work/train_config/train_20231103.1/config
prepare tokenizer
update token length: 150
Using DreamBooth method.
prepare images.
found directory /work/my_dataset-20230715.1/train_img/10_shs girl contains 51 image files
No caption file found for 51 images. Training will continue without captions for these images. If class token exists, it will be used. / 51枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
/work/my_dataset-20230715.1/train_img/10_shs girl/0001.png
/work/my_dataset-20230715.1/train_img/10_shs girl/0002.png
/work/my_dataset-20230715.1/train_img/10_shs girl/0003.png
/work/my_dataset-20230715.1/train_img/10_shs girl/0004.png
/work/my_dataset-20230715.1/train_img/10_shs girl/0005.png
/work/my_dataset-20230715.1/train_img/10_shs girl/0006.png... and 46 more
found directory /work/my_dataset-20230715.1/reg_img/1_1girl contains 500 image files
No caption file found for 500 images. Training will continue without captions for these images. If class token exists, it will be used. / 500枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
/work/my_dataset-20230715.1/reg_img/1_1girl/transparent_1.png
/work/my_dataset-20230715.1/reg_img/1_1girl/transparent_10.png
/work/my_dataset-20230715.1/reg_img/1_1girl/transparent_100.png
/work/my_dataset-20230715.1/reg_img/1_1girl/transparent_101.png
/work/my_dataset-20230715.1/reg_img/1_1girl/transparent_102.png
/work/my_dataset-20230715.1/reg_img/1_1girl/transparent_103.png... and 495 more
510 train images with repeating.
500 reg images.
[Dataset 0]
  batch_size: 2
  resolution: (512, 512)
  enable_bucket: True
  min_bucket_reso: 320
  max_bucket_reso: 960
  bucket_reso_steps: 64
  bucket_no_upscale: False

  [Subset 0 of Dataset 0]
    image_dir: "/work/my_dataset-20230715.1/train_img/10_shs girl"
    image_count: 51
    num_repeats: 10
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.05
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: shs girl
    caption_extension: .txt

  [Subset 1 of Dataset 0]
    image_dir: "/work/my_dataset-20230715.1/reg_img/1_1girl"
    image_count: 500
    num_repeats: 1
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.05
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: True
    class_tokens: 1girl
    caption_extension: .txt


[Dataset 0]
loading image sizes.
  0%|          | 0/551 [00:00<?, ?it/s]make buckets
100%|██████████| 551/551 [00:00<00:00, 6029.49it/s]
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (320, 704), count: 20
bucket 1: resolution (320, 768), count: 20
bucket 2: resolution (384, 640), count: 60
bucket 3: resolution (448, 576), count: 140
bucket 4: resolution (512, 512), count: 780
mean ar error (without repeats): 0.002443827835159254
preparing accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint: /base_model/wd-1-5-beta2-fp32.safetensors
/home/user/.local/lib/python3.10/site-packages/safetensors/torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
CrossAttention.forward has been replaced to enable xformers.
import network module: lycoris.kohya
[Dataset 0]
caching latents.
0it [00:00, ?it/s]
Using rank adaptation algo: full
Disable conv layer
Use Dropout value: 0.0
Create LyCORIS Module
create LyCORIS for Text Encoder: 138 modules.
Create LyCORIS Module
create LyCORIS for U-Net: 256 modules.
module type table: {'FullModule': 330, 'NormModule': 64}
enable LyCORIS for text encoder
enable LyCORIS for U-Net
preparing optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/user/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/home/user/.local/lib/python3.10/site-packages/cv2/../../lib64')}
  warn(
/home/user/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /home/user/.local/lib/python3.10/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary /home/user/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home/user/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py:48: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn(
use 8-bit AdamW optimizer | {'weight_decay': 0.1, 'betas': (0.9, 0.99)}
override steps. steps for 2 epochs is / 指定エポックまでのステップ数: 1020
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 510
  num reg images / 正則化画像の数: 500
  num batches per epoch / 1epochのバッチ数: 510
  num epochs / epoch数: 2
  batch size per device / バッチサイズ: 2
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 1020
steps:   0%|          | 0/1020 [00:00<?, ?it/s]
epoch 1/2
/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py:459: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
  return F.conv2d(input, weight, bias, self.stride,
Blocksparse is not available: the current GPU does not expose Tensor cores
Traceback (most recent call last):
  File "/code/sd-scripts/train_network.py", line 873, in <module>
    train(args)
  File "/code/sd-scripts/train_network.py", line 688, in train
    optimizer.step()
  File "/home/user/.local/lib/python3.10/site-packages/accelerate/optimizer.py", line 134, in step
    self.scaler.step(self.optimizer, closure)
  File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 265, in step
    self.update_step(group, p, gindex, pindex)
  File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 506, in update_step
    F.optimizer_update_8bit_blockwise(
  File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 858, in optimizer_update_8bit_blockwise
    str2optimizer8bit_blockwise[optimizer_name][0](
NameError: name 'str2optimizer8bit_blockwise' is not defined
steps:   0%|          | 0/1020 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/home/user/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/user/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/home/user/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/python/bin/python3.10', 'train_network.py', '--config_file', '/work/train_config/train_20231103.1/config.toml']' returned non-zero exit status 1.

@aoirint
Copy link
Owner Author

aoirint commented Nov 4, 2023

sd-scripts側のrequirements.txtに依存することで一部ライブラリのバージョンが固定されていない問題の解消と、
このPRを進めるにあたってリポジトリ側でライブラリの更新ができるようにするため、
リポジトリ側でPythonライブラリの依存関係をすべて管理するようにしました。

@aoirint
Copy link
Owner Author

aoirint commented Nov 5, 2023

@aoirint aoirint marked this pull request as ready for review November 5, 2023 05:52
@aoirint
Copy link
Owner Author

aoirint commented Nov 5, 2023

bitsandbytes 0.41.1にアップデートするとエラーがなくなったのでよさそう

@aoirint aoirint merged commit 0f20c7b into main Nov 5, 2023
@aoirint aoirint deleted the patch-cuda-runtime-base-image branch November 5, 2023 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switching to CUDA runtime base image for size reduction
1 participant