Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtx 3000 series broken compatibility #32

Closed
JulianPinzaru opened this issue Nov 10, 2020 · 38 comments
Closed

rtx 3000 series broken compatibility #32

JulianPinzaru opened this issue Nov 10, 2020 · 38 comments

Comments

@JulianPinzaru
Copy link

JulianPinzaru commented Nov 10, 2020

I tried to install nvidia driver ( 455 ) by myself on my ubuntu 18.04 with python 3.7 and tensorflow 1.14 (also tried 1.15).
It always said it couldn't find a gpu when trying to start training (or other errors like attempting to import cublas.10 files with a failure, while I had cuda 11 installed instead ). I got an rtx 3090 founder edition gpu.
I tried different approaches by reinstalling things and wasted more than 10 hours, it never worked for me. It was working on my titan rtx though, on a few different computer rigs.
Finally I thought that maintainers claimed it is working on their end for rtx 3000, maybe I can try their docker container.
It didn't work initially, then I realized I have a few more steps to do, so I installed nvidia-docker2 ( nvidia-container-toolkit ) thinking that it should certainly work. Unfortunately, it causes errors again:

Output directory: ./results/00015-jjl_1024-mirror-24gb-gpu-bg-resumeffhq1024
Training data: ./datasets/jjl_1024
Training length: 25000 kimg
Resolution: 1024
Number of GPUs: 1

Creating output directory...
Loading training set...
Image shape: [3, 1024, 1024]
Label shape: [0]

Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Failed!
Traceback (most recent call last):
File "train.py", line 591, in
main()
File "train.py", line 583, in main
run_training(**vars(args))
File "train.py", line 473, in run_training
training_loop.training_loop(**training_options)
File "/var/www/training/training_loop.py", line 123, in training_loop
Gs = G.clone('Gs')
File "/var/www/dnnlib/tflib/network.py", line 457, in clone
net.copy_vars_from(self)
File "/var/www/dnnlib/tflib/network.py", line 490, in copy_vars_from
src_net._get_vars()
File "/var/www/dnnlib/tflib/network.py", line 297, in _get_vars
self._vars = OrderedDict(self._get_own_vars())
File "/var/www/dnnlib/tflib/network.py", line 286, in _get_own_vars
self._init_graph()
File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph
out_expr = self._build_func(*self._input_templates, **build_kwargs)
File "/var/www/training/networks.py", line 231, in G_main
num_layers = components.synthesis.input_shape[1]
File "/var/www/dnnlib/tflib/network.py", line 232, in input_shape
return self.input_shapes[0]
File "/var/www/dnnlib/tflib/network.py", line 219, in input_shapes
self._input_shapes = [t.shape.as_list() for t in self.input_templates]
File "/var/www/dnnlib/tflib/network.py", line 267, in input_templates
self._init_graph()
File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph
out_expr = self._build_func(*self._input_templates, **build_kwargs)
File "/var/www/training/networks.py", line 439, in G_synthesis
x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3)
File "/var/www/training/networks.py", line 392, in layer
x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv)
File "/var/www/training/networks.py", line 105, in modulated_conv2d_layer
s = apply_bias_act(s, bias_var='mod_bias', trainable=trainable) + 1 # [BI] Add bias (initially 1).
File "/var/www/training/networks.py", line 50, in apply_bias_act
return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, gain=gain, clamp=clamp)
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp)
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda
cuda_op = _get_plugin().fused_bias_act
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "/var/www/dnnlib/tflib/custom_ops.py", line 159, in get_plugin
_run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir))
File "/var/www/dnnlib/tflib/custom_ops.py", line 69, in _run_cmd
raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
RuntimeError: NVCC returned an error. See below for full command line and output log:

nvcc --compiler-options '-fPIC' --compiler-options '-I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0' --linker-options '-L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.1' --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/com_google_absl" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/eigen_archive" 2>&1 "/var/www/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmp4dn1nm6o/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmp4dn1nm6o"

nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

By googling it I identified that similar errors ( sm_75 ) are occurring when there is code / cuda / driver compatibility issues. At least that's what people say.
Please help with a decent working container version at least.

@9of9
Copy link

9of9 commented Nov 15, 2020

Also having issues with an RTX 3090. I'm running Stylegan2-ADA on Windows, with tensorflow-gpu 1.14. I haven't attempted training yet, but seeing strange behaviour when attempting inference:
It takes a very long time to spin up inference (~10 minutes on the metfaces example), at which point the first image comes out correct, and in quick succession the remaining seeds in the set all get rendered out as RGB noise.

Not even sure what the problem could be specifically, since it's not giving any errors, seems to succeed with compilation and it's clearly capable of some inference. Perhaps weird fallback behaviour?

@nurpax
Copy link
Contributor

nurpax commented Nov 16, 2020

As far as I know, you will need TensorFlow 1.x built against CUDA 11.1. The TensorFlow build needs to enable compute architecture sm86.

This was discussed for Linux containers here: #10

@JulianPinzaru
Copy link
Author

@nurpax Thank you very much for your help! It looks like I missed to have the latest docker file you pushed there.
I was able to start the training in that new container (nvcr.io/nvidia/tensorflow:20.10-tf1-py3) and it seems to work 👍
I will give more feedback tomorrow, I am letting it to train over the night to see if it's not running into any issues. Hopefully it will be fine. :)

@nurpax
Copy link
Contributor

nurpax commented Nov 17, 2020

I'm glad you got it working @JulianPinzaru! Feedback on how it goes after your overnight training is welcome, it's good to get some confirmation that this is working as intended.

@JulianPinzaru
Copy link
Author

JulianPinzaru commented Nov 17, 2020

@nurpax
I appreciate your feedback.
I was running it over the night as mentioned, but it didn't do anything due to the fact that it got stuck on calculating fid for some reason.

Downloading https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metrics/inception_v3_features.pkl ... done
Calculating real image statistics for fid50k_full...

It might be some sort of bug... I disabled it with --metrics=none and then started over and it apparently was computing well, until the moment I realized it was increasing the computation time for every tick.

image

I asked other people from my slack channel, someone used the same docker image and claimed that they have experienced a much longer training time per tick for 2 gpus (rtx 3090) than for 1 gpu (rtx 2080). I was also expecting it to run much quicker due to the mentioned improvements made on this stylegan2-ada repo, in fact I had 2 rtx titans before and it was running really quick on the previous stylegan2 repo ( 512 resolution images for 9~10 minutes per tick ), while here I already went over 48 minutes per tick and don't really know what to expect out of it.
I am training on 10k images dataset, it's way over 1-2k images that ada is capable of, so don't think I might be overfiting the model, but it keeps increasing the aug value which I also noticed.

Attaching the details of 2 gpus the other guy was trying to run on:

image

@JulianPinzaru
Copy link
Author

JulianPinzaru commented Nov 18, 2020

UPDATED
Investigating the problem. It might be caused by the forked repo I used. ( https://github.com/dvschultz/stylegan2-ada )
Will be testing over night with the original repo, without any minor changes of those.
Thank you for not rushing into the aforementioned problems, I appreciate your time and I hope it was just my bad.

@nerdyrodent
Copy link

Just got a 3090 and the same thing is happening here. Stable ticks on a 1070 (30 mins)but switching to using the docker image and it gets slower each tick.

@JulianPinzaru
Copy link
Author

JulianPinzaru commented Nov 18, 2020

UPDATED

Looks like people are mentioning that with augmentation enabled it takes them up to 14 minutes on google colab (premium) to train for a single tick... There might be something wrong with it.

I noticed that when disabling augmentation, it takes up to 5 minutes 30 seconds for one tick, which looks reasonable and is what I would expect it to be. But it looks to be a huge unexplained overhead when enabling augmentation. I attached screenshots for all of the three cases I tried:

python train.py --outdir ./results --snap=2 --data=./datasets/jjl_512 --augpipe="bg" --mirror=True --resume="ffhq512" --metrics=none
image

python train.py --outdir ./results --snap=2 --data=./datasets/jjl_512 --augpipe="bg" --mirror=True --resume="ffhq512" --metrics=none --p 0.25
image

python train.py --outdir ./results --snap=2 --data=./datasets/jjl_512 --mirror=True --resume="ffhq512" --metrics=none --aug="noaug"
image

I thought the fixed augmentation wouldn't cause any time issues, but it turned out to be ~55 minutes per tick as well.
@nurpax Is there any suggestions which would help me to get into more investigations to give you some more feedback / details about that issue ?
Thank you in advance

@nerdyrodent
Copy link

Changed my aug pipeline to just filter and yes, back down to 5 mins a tick. Will have to leave it longer to see if that time increases. I’d guess it’s b,g or c causing issues?

@nurpax
Copy link
Contributor

nurpax commented Nov 19, 2020

Thanks for the comments @JulianPinzaru and @nerdyrodent! All our training was done on TF 1.14 (with a TensorFlow that I built from source) using 8xGPU DGX-1 machines and we have not experienced this problem in our training. But quite clearly there is something wrong as the above comments indicate.

In the interest of clarity, can you summarize what is known to work and what is known to not work? (Ideally with unmodified stylegan2-ada code, if possible.) There are many variables: augmentation or not, 1 or 2 GPUs, RTX 3090 or Volta, container url/tag etc. I'm not making any promises for a quick fix, but if we can clearly pin down the working and broken configurations, it'll be much easier for us to look into this.

I can tabulate working/broken configs in this comment and edit as I get feedback from you.

Known to work

  • TF 1.14/DGX-1 8xGPU/CUDA 10.0 (no problems with augmentation on or off)

Known broken

  • TODO Janne to fill this in based on feedback.

@nerdyrodent
Copy link

nerdyrodent commented Nov 19, 2020

Ubuntu 20.04
Driver Version: 455.38
CUDA Version: 11.1
GeForce RTX 3090 FE
Docker: stylegan2ada:latest (nvcr.io/nvidia/tensorflow:20.10-tf1-py3 base image)

Works as expected:
sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -it --rm -v pwd:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python train.py --outdir=./training-runs --gpus=1 --data=./datasets/paintings --mirror=1 --cfg=stylegan2-24gb --snap=6 --metrics=none --augpipe=filter --resume=training-runs/00039-paintings-mirror-stylegan2-24gb-filter-resumecustom/network-snapshot-000090.pkl)"

Doesn't work as expected (gets slower). Only augpipe changed:
sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -it --rm -v pwd:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python train.py --outdir=./training-runs --gpus=1 --data=./datasets/paintings --mirror=1 --cfg=stylegan2-24gb --snap=6 --metrics=none --augpipe=bgc --resume=training-runs/00039-paintings-mirror-stylegan2-24gb-filter-resumecustom/network-snapshot-000090.pkl)"

(My config uses map=8 rather than map=2 from auto)

Update:
Tried the new 20.11-tf1-py3 docker - issue remains
Also noticed the audible difference with the GPU. Fans don't kick in with any augpipe including blit or geom. Mem crtl% frequently drops to 1%, which is well below the typical 62% when NOT using blit or geom.

@JulianPinzaru
Copy link
Author

JulianPinzaru commented Nov 20, 2020

OS: Ubuntu 18.04
Driver Version: 455.38
CUDA: Version: 11.1
GPU: GeForce RTX 3090 FE
RAM: 32GB DDR4
CPU: I-7 10700k
Docker: stylegan2ada:latest (nvcr.io/nvidia/tensorflow:20.10-tf1-py3 base image)
Docker build command: docker build -f Dockerfile -t stylegan2-ada-original:tf15 .
Docker run command 1: docker run --gpus all -it --rm --memory="28g" -v `pwd`:/workspace/stylegan2-ada stylegan2-ada-original:tf15 /bin/bash
Docker run command 2: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -it --rm -v `pwd`:/workspace/stylegan2-ada stylegan2-ada-original:tf15 /bin/bash

During testing I noticed that VIRT memory goes up to 61g, maybe that's one of the effects of heavy loaded augmentation.
image

I made more experiments with augmentation OFF vs augmentation ON, and used python time library to record the critical operations that take longer than usual. I used docker command 1 and docker command 2 mentioned above to do the tests. The screenshots though are made for docker command 2 (using nvidia-docker with suggested params to start the container).

I started with recordings for each iteration of the training loop, specifically on "run training ops" for loop. Posting the screenshot below to explain what I did.
image

Further on, I used the following command to train a NON AUGMENTED model:

python train.py --outdir ./results --data=./datasets/jjl_512 --gpus=1 --snap=2 --mirror=True --metrics=none --aug="noaug" --resume="ffhq512"

For augmentation I used fixed aug to emphasize the time it takes. (it takes approximately the same time as with dynamic aug strength growth, I posted it in my previous comment with screenshots of the ticks)

python train.py --outdir ./results --data=./datasets/jjl_512 --gpus=1 --snap=2 --mirror=True --metrics=none --augpipe="bg" --aug="fixed" --p=0.35 --resume="ffhq512"

AUG OFF
image

AUG ON
image

Then using the same approach, I measured the time for the following lines of code (individually):
248, 249 - 250, 251, 252 - 253 ( numbers correspond to the code screenshot I posted above)

Screens for the aforementioned lines respectively

AUG OFF
image

AUG ON
image

Also tested the timing for those:
aug.run_validation(minibatch_size=minibatch_size)
and
aug.tune(minibatch_size * minibatch_repeats)
but it didn't seem to have any impact on the running time.

@nurpax please let me know if it was helpful or if you would like me to benchmark higher hierarchy augmentation options that rely on these lower ones.

@JulianPinzaru
Copy link
Author

JulianPinzaru commented Nov 20, 2020

UPDATED
I don't have access to a Volta GPU, so I tested following configs on my RTX 3090:

'blit', 'geom', 'color', 'filter', 'noise', 'cutout', 'bg', 'bgc', 'bgcf', 'bgcfn', 'bgcfnc'
with --aug="fixed" --p=0.25

Timing for "while not done" loop ( iteration approx average ):

  • noaug - Iteration TIME: ~ 2.5 seconds
  • blit - Iteration TIME: ~ 15.0 seconds
  • geom - Iteration TIME: ~ 29.0 seconds
  • color - Iteration TIME: ~ 2.7 seconds
  • filter - Iteration TIME: ~ 2.9 seconds
  • noise - Iteration TIME: ~ 2.6 seconds
  • cutout - Iteration TIME: ~ 2.6 seconds
  • bg - Iteration TIME: ~ 28.0 seconds
  • bgc - Iteration TIME: ~ 29.0 seconds (with spikes up to 38 seconds)
  • bgcf - Iteration TIME: ~ 29.0 seconds
  • bgcfn - Iteration TIME: ~ 29.0 seconds
  • bgcfnc - Iteration TIME: ~ 29.0 seconds

In reality, when the aug is not fixed, it grows to 0.4 and gets those numbers to even higher values in seconds.
Some of those I noticed have spikes, I didn't wait for each one more than 3 iterations to see which have spikes as well though, don't think it would be meaningful.
So I noticed only blit and geom having issues (that's why I covered them in more details above).

@nerdyrodent
Copy link

Did some quick and dirty testing with the augpipeline_specs. I'd been using filter, noise and cutout without issue, so I used those as a base. I've also observed the GPU mem ctrl% drop below 10% when things are going slowly. Additionally, the card is typically audible when working as expected.

These appeared to have the largest performance impact, based only on GPU mem usage graphs:
xint=1
rotate=1
aniso=1
xfrac=1

Example aug pipeline specs tests:
'nr': dict(imgfilter=1, noise=1, cutout=1, xflip=1, rotate90=1, scale=1),
First tick was 8mins after tick 0, so still a bit slower than base. Some GPU memory use dips below 10%. Didn't test longer to see if it got slower.

This test:
'nr': dict(imgfilter=1, noise=1, cutout=1, xflip=1, rotate90=1),
First tick was 6mins after tick 0, memory load steady. Second tick also the same speed.
I'm still running this one, but I expect timings to remain steady.

@JulianPinzaru
Copy link
Author

@nerdyrodent I also have a feeling that it is somehow overusing the hard storage (SSD) and not using RAM or VRAM, some bottleneck present there and it's not clear where. Maybe due to how it operates with CUDA_CACHE directory, not sure.

@nerdyrodent
Copy link

I've removed docker from the equation, and I'm still seeing the same behaviour. Using nvidia tensorflow r1.15.4+nv20.11 (via pip) + cuda_11.1.1_455.32.00

Another thing I've noticed is one CPU core will stick it out at 100% for a while, which it doesn't do when not using blit or geom. Now I'm really confused.

@JulianPinzaru
Copy link
Author

@nurpax
Hello! Any updates? 🙂

@JulianPinzaru
Copy link
Author

@nurpax
Is there any upcoming willingness to help us here ?
Not trying to rush you, just pinging to see if there is anyone reading over that issue and posts.

@nurpax
Copy link
Contributor

nurpax commented Dec 10, 2020

@JulianPinzaru Hi! I'm following some of the posts (incl. this one) but alas, we don't have any new updates to this one. This problem does not actively impact our ongoing research projects and we're a small team of researchers with limited time.

I will try to update if we find something that may apply here -- but I also prefer not to post updates unless we have fairly high confidence that the ideas/fixes actually help.

Can you check the exact version of libcudnn that TensorFlow is using?

Reading through the comments: so there is a massive slow down and a memory leak when using any of the --augpipe b* variants? Almost looks like some operations are falling back to CPU when we'd expect them to run on the GPU.

@sanmeow
Copy link

sanmeow commented Dec 11, 2020

I'm facing the same issue too, so only use with noaug for my RTX 3090.Wish this can be fixed.Now the only way is I use 2080ti for ada with different GPUs.

@nurpax
Copy link
Contributor

nurpax commented Dec 18, 2020

A heads up on TensorFlow 1.x, RTX 3090 support and StyleGAN2-ADA. Our research group is in the process of switching to PyTorch and StyleGAN2 ADA will be our last project written in TensorFlow.

We have ported StyleGAN2 ADA to PyTorch and plan on releasing this new codebase as the official StyleGAN2 ADA PyTorch implementation. We hope to release the PyTorch port sometime in January 2021.

We expect the problems discussed in this GitHub issue to disappear as we transition to CUDA 11, cuDNN 8.0.x and the latest PyTorch release.

@johndpope
Copy link

johndpope commented Dec 18, 2020

thanks for the heads up. if there was an alpha branch - unsupported - it would be super. In the mean time - have to piece things together using this repo https://github.com/rosinality/stylegan2-pytorch

UPDATE - pytorch yet to release cuda11.1 binaries
pytorch/pytorch#45021

might have to swap in old GPU to get some work done.

UPDATE2 - the tensorflow docker container just slows down irrspective of neural net stuff.
I made a PR to latest docker container / current one is a few months old - #51

@JulianPinzaru
Copy link
Author

A heads up on TensorFlow 1.x, RTX 3090 support and StyleGAN2-ADA. Our research group is in the process of switching to PyTorch and StyleGAN2 ADA will be our last project written in TensorFlow.

We have ported StyleGAN2 ADA to PyTorch and plan on releasing this new codebase as the official StyleGAN2 ADA PyTorch implementation. We hope to release the PyTorch port sometime in January 2021.

We expect the problems discussed in this GitHub issue to disappear as we transition to CUDA 11, cuDNN 8.0.x and the latest PyTorch release.

Salivating to see that one on pytorch! :) I bet it is a future proof decision, as tf 1.15 is not maintained anymore and there is a big pytorch community, willing to get their hands on 3000 series as well.
Thanks for heads up!

@BartWMK
Copy link

BartWMK commented Dec 24, 2020

I do not expect the pytorch cuda11/cuDNN 8.0x version to resolve what seems rtx3000 series driver issue

Im using cuda 11/8.0x (most recent TF1/C11 docker from nvidia, with updated lastest release of cudnn)

The problem is in the use of 2 tf functions in augment.py:
tf.nn.depthwise_conv2d_backprop_input
and
tf.nn.depthwise_conv2d
in this block:
# Execute geometric transformations.

When disabling these 4 filtering up/downscale invocations not only removes the dramatic performance impact but also removes a approx (res=512) 1gb/h memory leak seemingly triggered by the code behind these operations.

As said, I noticed same behavior on a pytorch implementation : (lucidrains/stylegan2-pytorch, although I did not research that up to offending operation level) so it seems to be pointing towards cudnn and/or driver issue. Seemingly, other peeps are seeing the same, also with pytorch: https://github.com/pytorch/pytorch/issues/47039

For those in the mood for a short-term workaround: replace the relevant calls with a less fancy (unfiltered?) scaling not using depthwise convolutions.

Do note that in general, depthwise convolutions dont scale well ( Gholami et al. , https://arxiv.org/pdf/1803.10615.pdf ) so not expecting miracles, but the current performance penalty and memory leak seem a bit excessive. 2080ti level should be possible on 3090.

@johndpope
Copy link

johndpope commented Dec 25, 2020

Hi Bart, feel free to throw up whatever code you have as a gist (might help other people troubleshooting) - https://gist.github.com/BartWMK - then can switch in your sample and see things more clearly.

UPDATE - supposedly it’s possible to get 3090 working without docker + ubuntu. (hit a wall with zsh - doesn't correctly find tensorflow packages / just use bash) I recommend using timeshift to snapshot/backup your working system before doing any brain surgery + POPOS to get nvidia drivers up and running out of the box.

From @dbkinghorn
https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/

But I get same error - alue 'sm_86' is not defined for option 'gpu-architecture' / can anyone get working locally without docker?
dbkinghorn/NGC-TF1-nvidia-examples#1

side note - found this stylegan2 pytorch code by @GreenLimeSia repo (seems pretty polished /it's a port function by function with documenting code / already handles all the tensorflow1 - pkl migrations
https://github.com/GreenLimeSia/GenEdi/tree/master/stylegan2

(seems feature complete)
**stylegan2**
    __init__.py
    loss_fns.py
    models.py
    modules.py
    project.py
    train.py
    utils.py

**run_convert_from_tf.py**
run_gui_interactive_local.py

Screenshot_2020-12-27_08-12-00

I got this to work using pytorch cuda 11.0 (even though 11.1 not released yet)

Tensorflow2 + stylegan2
it seems @k-l-lambda has solved compatibility problems with tensorflow2 + stylegan2 using existing code from this repo - https://github.com/johndpope/stylegan-web

a lot of the code gets around compatibility problems by running in compatibility mode - NVidia - this decoractive code to get to tensorflow2 would help out a lot more than pytorch port - there's so many libraries hanging off this stylegan2-ada repo.

import tensorflow.compat.v1 as tensorflow
tf = tensorflow
tf.disable_v2_behavior()

UPDATE - tensorflow 2 - still gets
nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'
https://github.com/johndpope/stylegan2-ada

python3 -c 'import tensorflow as tf; print(tf.__version__)' 
2.5.0-dev20201218

seems like nvidia have bumped the cuda toolkit on december 17th to 11.2
so maybe just getting latest toolkit will fix everything - https://developer.nvidia.com/cuda-downloads

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Allowed values for this option: 'compute_35','compute_37','compute_50',
'compute_52','compute_53','compute_60','compute_61','compute_62','compute_70',
'compute_72','compute_75','compute_80','lto_35','lto_37','lto_50','lto_52',
'lto_53','lto_60','lto_61','lto_62','lto_70','lto_72','lto_75','lto_80',
'sm_35','sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70',
'sm_72','sm_75','sm_80'.

| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1   
uname -r
5.8.0-7630-generic

@k-l-lambda
Copy link

@johndpope You mentioned me, and this is my key commit for tf2 compatibility: k-l-lambda/stylegan-web@6be1a4f

Hope it helps.

@johndpope
Copy link

johndpope commented Dec 28, 2020

success ! got it working without docker.
I removed 455 driver - rebooted
https://linuxconfig.org/how-to-uninstall-the-nvidia-drivers-on-ubuntu-20-04-focal-fossa-linux
then I installed the cuda toolkit 11.2 using the download from site
sudo sh cuda_11.2.0_460.27.04_linux.run

NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2

threw these into my ~/.zshrc file

export PATH=/usr/local/cuda-11.2/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:${LD_LIBRARY_PATH}

new terminal window sanity check

which nvcc 
/usr/local/cuda-11.2/bin/nvcc

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

this has tensorflow 2 fixes (compatibility mode)
https://github.com/johndpope/stylegan2-ada

(machine is running a bit slow / and chrome is unusually crashing - so beware)
I recommend using timeshift if you need to rewind config settings.

thanks again @k-l-lambda

@mdvorsky
Copy link

mdvorsky commented Dec 29, 2020

Upgrading the BASE_IMAGE in Dockerfile (#51) fixed the issue for me. The new 20.12 Docker image contains cuDNN 8.0.5 which according to the release notes contains significant performance improvements for RTX 3090. (The current image 20.10 uses cuDNN 8.0.4).

@Emperornero
Copy link

I'm using Windows and the disjointed nature of this issue thread is a bit hard to follow. From what I'm seeing, someone here has fixed the issues with the 3090, but I'm unsure what exactly had been done.

Does someone have a definitive fix for using the 30 series with this? I bought a 3090 specifically for custom StyleGANs, only to find I can't do them because of this compatibility issue.

@johndpope
Copy link

johndpope commented Jan 5, 2021

Use latest nvidia driver 460 + cuda 11.2 toolkit - https://developer.nvidia.com/cuda-downloads on host
Use the latest docker container (there's going to be a january release any day) - #51
If you want to run this codebase on 3090 directly on your machine - it won't work without tensorflow1.
There's no plan to support tensorflow2 - everything is moving to pytorch.

However - adding a few lines to add compatibility mode for tensorflow 2 does get it working.
It's frankly pretty miserable that Nvidia won't add this.
https://github.com/johndpope/stylegan2-ada/tree/main

// this line
import tensorflow as tf

// becomes this line
import tensorflow.compat.v1 as tensorflow
tf = tensorflow
tf.disable_v2_behavior()

more elaborate fork here
https://github.com/johndpope/stylegan2-ada/tree/digressions

@JulianPinzaru
Copy link
Author

@nurpax hi ! Is there any chance to see sg-ada on pytorch any time soon? Thanks

@nurpax
Copy link
Contributor

nurpax commented Feb 1, 2021

@JulianPinzaru YES!

We just published the repo, find your bits at: https://github.com/NVlabs/stylegan2-ada-pytorch

I haven't tested the code on RTX 3090 myself. Pretty sure it will require CUDA 11.1 to run and might break on CUDA 11.0. I will be looking into RTX 3090 support this week.

@johndpope
Copy link

Yes. Again, it’s not really a ‘conversion’ just running in compatibility mode. There is a way to convert code / but I didn’t go down this route.

@johndpope
Copy link

Sorry @Thunder003 / won’t be able to be much more help here. Tensorflow is kinda dead to me now.

@AlirezaParchami
Copy link

AlirezaParchami commented Jan 18, 2022

I have also an odd issue with StyleGan2 (official TensorFlow implementation) on RTX 3090 in Windows at the very first stage, running Generator for a test.
The GPU driver is updated, TensorFlow 1.14 is in use, and I have tested CUDA 11.1 and CUDA 11.5 with the code!
There is no error or exception, but run_generator.py takes around 30 min to generate 11 images and the images are generated totally noisy, using stylegan2-ffhq-config-f network (as you can see below).
I tested the same setting on GTX 960 and another cluster, which only took 2 minutes to generate and it worked well!

Is there any solution or fix for this issue?!

730-514-max

@JulianPinzaru
Copy link
Author

JulianPinzaru commented Jan 21, 2022

Is there any solution or fix for this issue?!

I don't think you should use Tensorflow implementation. Just go for NVLabs Pytorch Stylegan2 (or 3). It works fine on 3000 series. It's also somewhat compatible with older TF trained network pkls (if I am not mistaken).

@winssk
Copy link

winssk commented May 9, 2022

Is there any solution or fix for this issue?!

Hi! Is there any update or solution for this issue?

@jannehellsten
Copy link

The recommended fix is to switch to either https://github.com/NVlabs/stylegan3 or https://github.com/NVlabs/stylegan2-ada-pytorch both of which are known to work on new hardware and recent versions of PyTorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests