rtx 3000 series broken compatibility #32

JulianPinzaru · 2020-11-10T06:53:10Z

I tried to install nvidia driver ( 455 ) by myself on my ubuntu 18.04 with python 3.7 and tensorflow 1.14 (also tried 1.15).
It always said it couldn't find a gpu when trying to start training (or other errors like attempting to import cublas.10 files with a failure, while I had cuda 11 installed instead ). I got an rtx 3090 founder edition gpu.
I tried different approaches by reinstalling things and wasted more than 10 hours, it never worked for me. It was working on my titan rtx though, on a few different computer rigs.
Finally I thought that maintainers claimed it is working on their end for rtx 3000, maybe I can try their docker container.
It didn't work initially, then I realized I have a few more steps to do, so I installed nvidia-docker2 ( nvidia-container-toolkit ) thinking that it should certainly work. Unfortunately, it causes errors again:

Output directory: ./results/00015-jjl_1024-mirror-24gb-gpu-bg-resumeffhq1024
Training data: ./datasets/jjl_1024
Training length: 25000 kimg
Resolution: 1024
Number of GPUs: 1

Creating output directory...
Loading training set...
Image shape: [3, 1024, 1024]
Label shape: [0]

Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Failed!
Traceback (most recent call last):
File "train.py", line 591, in
main()
File "train.py", line 583, in main
run_training(**vars(args))
File "train.py", line 473, in run_training
training_loop.training_loop(**training_options)
File "/var/www/training/training_loop.py", line 123, in training_loop
Gs = G.clone('Gs')
File "/var/www/dnnlib/tflib/network.py", line 457, in clone
net.copy_vars_from(self)
File "/var/www/dnnlib/tflib/network.py", line 490, in copy_vars_from
src_net._get_vars()
File "/var/www/dnnlib/tflib/network.py", line 297, in _get_vars
self._vars = OrderedDict(self._get_own_vars())
File "/var/www/dnnlib/tflib/network.py", line 286, in _get_own_vars
self._init_graph()
File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph
out_expr = self._build_func(*self._input_templates, **build_kwargs)
File "/var/www/training/networks.py", line 231, in G_main
num_layers = components.synthesis.input_shape[1]
File "/var/www/dnnlib/tflib/network.py", line 232, in input_shape
return self.input_shapes[0]
File "/var/www/dnnlib/tflib/network.py", line 219, in input_shapes
self._input_shapes = [t.shape.as_list() for t in self.input_templates]
File "/var/www/dnnlib/tflib/network.py", line 267, in input_templates
self._init_graph()
File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph
out_expr = self._build_func(*self._input_templates, **build_kwargs)
File "/var/www/training/networks.py", line 439, in G_synthesis
x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3)
File "/var/www/training/networks.py", line 392, in layer
x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv)
File "/var/www/training/networks.py", line 105, in modulated_conv2d_layer
s = apply_bias_act(s, bias_var='mod_bias', trainable=trainable) + 1 # [BI] Add bias (initially 1).
File "/var/www/training/networks.py", line 50, in apply_bias_act
return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, gain=gain, clamp=clamp)
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp)
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda
cuda_op = _get_plugin().fused_bias_act
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "/var/www/dnnlib/tflib/custom_ops.py", line 159, in get_plugin
_run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir))
File "/var/www/dnnlib/tflib/custom_ops.py", line 69, in _run_cmd
raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
RuntimeError: NVCC returned an error. See below for full command line and output log:

nvcc --compiler-options '-fPIC' --compiler-options '-I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0' --linker-options '-L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.1' --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/com_google_absl" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/eigen_archive" 2>&1 "/var/www/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmp4dn1nm6o/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmp4dn1nm6o"

nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

By googling it I identified that similar errors ( sm_75 ) are occurring when there is code / cuda / driver compatibility issues. At least that's what people say.
Please help with a decent working container version at least.

9of9 · 2020-11-15T15:06:00Z

Also having issues with an RTX 3090. I'm running Stylegan2-ADA on Windows, with tensorflow-gpu 1.14. I haven't attempted training yet, but seeing strange behaviour when attempting inference:
It takes a very long time to spin up inference (~10 minutes on the metfaces example), at which point the first image comes out correct, and in quick succession the remaining seeds in the set all get rendered out as RGB noise.

Not even sure what the problem could be specifically, since it's not giving any errors, seems to succeed with compilation and it's clearly capable of some inference. Perhaps weird fallback behaviour?

nurpax · 2020-11-16T07:44:08Z

As far as I know, you will need TensorFlow 1.x built against CUDA 11.1. The TensorFlow build needs to enable compute architecture sm86.

This was discussed for Linux containers here: #10

JulianPinzaru · 2020-11-17T09:47:37Z

@nurpax Thank you very much for your help! It looks like I missed to have the latest docker file you pushed there.
I was able to start the training in that new container (nvcr.io/nvidia/tensorflow:20.10-tf1-py3) and it seems to work 👍
I will give more feedback tomorrow, I am letting it to train over the night to see if it's not running into any issues. Hopefully it will be fine. :)

nurpax · 2020-11-17T11:01:45Z

I'm glad you got it working @JulianPinzaru! Feedback on how it goes after your overnight training is welcome, it's good to get some confirmation that this is working as intended.

JulianPinzaru · 2020-11-17T22:23:44Z

@nurpax
I appreciate your feedback.
I was running it over the night as mentioned, but it didn't do anything due to the fact that it got stuck on calculating fid for some reason.

Downloading https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metrics/inception_v3_features.pkl ... done
Calculating real image statistics for fid50k_full...

It might be some sort of bug... I disabled it with --metrics=none and then started over and it apparently was computing well, until the moment I realized it was increasing the computation time for every tick.

I asked other people from my slack channel, someone used the same docker image and claimed that they have experienced a much longer training time per tick for 2 gpus (rtx 3090) than for 1 gpu (rtx 2080). I was also expecting it to run much quicker due to the mentioned improvements made on this stylegan2-ada repo, in fact I had 2 rtx titans before and it was running really quick on the previous stylegan2 repo ( 512 resolution images for 9~10 minutes per tick ), while here I already went over 48 minutes per tick and don't really know what to expect out of it.
I am training on 10k images dataset, it's way over 1-2k images that ada is capable of, so don't think I might be overfiting the model, but it keeps increasing the aug value which I also noticed.

Attaching the details of 2 gpus the other guy was trying to run on:

JulianPinzaru · 2020-11-18T06:47:10Z

UPDATED
Investigating the problem. It might be caused by the forked repo I used. ( https://github.com/dvschultz/stylegan2-ada )
Will be testing over night with the original repo, without any minor changes of those.
Thank you for not rushing into the aforementioned problems, I appreciate your time and I hope it was just my bad.

nerdyrodent · 2020-11-18T23:30:38Z

Just got a 3090 and the same thing is happening here. Stable ticks on a 1070 (30 mins)but switching to using the docker image and it gets slower each tick.

JulianPinzaru · 2020-11-18T23:30:44Z

UPDATED

Looks like people are mentioning that with augmentation enabled it takes them up to 14 minutes on google colab (premium) to train for a single tick... There might be something wrong with it.

I noticed that when disabling augmentation, it takes up to 5 minutes 30 seconds for one tick, which looks reasonable and is what I would expect it to be. But it looks to be a huge unexplained overhead when enabling augmentation. I attached screenshots for all of the three cases I tried:

python train.py --outdir ./results --snap=2 --data=./datasets/jjl_512 --augpipe="bg" --mirror=True --resume="ffhq512" --metrics=none

python train.py --outdir ./results --snap=2 --data=./datasets/jjl_512 --augpipe="bg" --mirror=True --resume="ffhq512" --metrics=none --p 0.25

python train.py --outdir ./results --snap=2 --data=./datasets/jjl_512 --mirror=True --resume="ffhq512" --metrics=none --aug="noaug"

I thought the fixed augmentation wouldn't cause any time issues, but it turned out to be ~55 minutes per tick as well.
@nurpax Is there any suggestions which would help me to get into more investigations to give you some more feedback / details about that issue ?
Thank you in advance

nerdyrodent · 2020-11-18T23:49:19Z

Changed my aug pipeline to just filter and yes, back down to 5 mins a tick. Will have to leave it longer to see if that time increases. I’d guess it’s b,g or c causing issues?

nurpax · 2020-11-19T11:05:53Z

Thanks for the comments @JulianPinzaru and @nerdyrodent! All our training was done on TF 1.14 (with a TensorFlow that I built from source) using 8xGPU DGX-1 machines and we have not experienced this problem in our training. But quite clearly there is something wrong as the above comments indicate.

In the interest of clarity, can you summarize what is known to work and what is known to not work? (Ideally with unmodified stylegan2-ada code, if possible.) There are many variables: augmentation or not, 1 or 2 GPUs, RTX 3090 or Volta, container url/tag etc. I'm not making any promises for a quick fix, but if we can clearly pin down the working and broken configurations, it'll be much easier for us to look into this.

I can tabulate working/broken configs in this comment and edit as I get feedback from you.

Known to work

TF 1.14/DGX-1 8xGPU/CUDA 10.0 (no problems with augmentation on or off)

Known broken

TODO Janne to fill this in based on feedback.

nerdyrodent · 2020-11-19T11:34:03Z

Ubuntu 20.04
Driver Version: 455.38
CUDA Version: 11.1
GeForce RTX 3090 FE
Docker: stylegan2ada:latest (nvcr.io/nvidia/tensorflow:20.10-tf1-py3 base image)

Works as expected:
sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -it --rm -v pwd:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python train.py --outdir=./training-runs --gpus=1 --data=./datasets/paintings --mirror=1 --cfg=stylegan2-24gb --snap=6 --metrics=none --augpipe=filter --resume=training-runs/00039-paintings-mirror-stylegan2-24gb-filter-resumecustom/network-snapshot-000090.pkl)"

Doesn't work as expected (gets slower). Only augpipe changed:
sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -it --rm -v pwd:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python train.py --outdir=./training-runs --gpus=1 --data=./datasets/paintings --mirror=1 --cfg=stylegan2-24gb --snap=6 --metrics=none --augpipe=bgc --resume=training-runs/00039-paintings-mirror-stylegan2-24gb-filter-resumecustom/network-snapshot-000090.pkl)"

(My config uses map=8 rather than map=2 from auto)

Update:
Tried the new 20.11-tf1-py3 docker - issue remains
Also noticed the audible difference with the GPU. Fans don't kick in with any augpipe including blit or geom. Mem crtl% frequently drops to 1%, which is well below the typical 62% when NOT using blit or geom.

JulianPinzaru · 2020-11-20T05:09:01Z

OS: Ubuntu 18.04
Driver Version: 455.38
CUDA: Version: 11.1
GPU: GeForce RTX 3090 FE
RAM: 32GB DDR4
CPU: I-7 10700k
Docker: stylegan2ada:latest (nvcr.io/nvidia/tensorflow:20.10-tf1-py3 base image)
Docker build command: docker build -f Dockerfile -t stylegan2-ada-original:tf15 .
Docker run command 1: docker run --gpus all -it --rm --memory="28g" -v `pwd`:/workspace/stylegan2-ada stylegan2-ada-original:tf15 /bin/bash
Docker run command 2: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -it --rm -v `pwd`:/workspace/stylegan2-ada stylegan2-ada-original:tf15 /bin/bash

During testing I noticed that VIRT memory goes up to 61g, maybe that's one of the effects of heavy loaded augmentation.

I made more experiments with augmentation OFF vs augmentation ON, and used python time library to record the critical operations that take longer than usual. I used docker command 1 and docker command 2 mentioned above to do the tests. The screenshots though are made for docker command 2 (using nvidia-docker with suggested params to start the container).

I started with recordings for each iteration of the training loop, specifically on "run training ops" for loop. Posting the screenshot below to explain what I did.

Further on, I used the following command to train a NON AUGMENTED model:

python train.py --outdir ./results --data=./datasets/jjl_512 --gpus=1 --snap=2 --mirror=True --metrics=none --aug="noaug" --resume="ffhq512"

For augmentation I used fixed aug to emphasize the time it takes. (it takes approximately the same time as with dynamic aug strength growth, I posted it in my previous comment with screenshots of the ticks)

python train.py --outdir ./results --data=./datasets/jjl_512 --gpus=1 --snap=2 --mirror=True --metrics=none --augpipe="bg" --aug="fixed" --p=0.35 --resume="ffhq512"

AUG OFF

AUG ON

Then using the same approach, I measured the time for the following lines of code (individually):
248, 249 - 250, 251, 252 - 253 ( numbers correspond to the code screenshot I posted above)

Screens for the aforementioned lines respectively

AUG OFF

AUG ON

Also tested the timing for those:
aug.run_validation(minibatch_size=minibatch_size)
and
aug.tune(minibatch_size * minibatch_repeats)
but it didn't seem to have any impact on the running time.

@nurpax please let me know if it was helpful or if you would like me to benchmark higher hierarchy augmentation options that rely on these lower ones.

JulianPinzaru · 2020-11-20T08:03:49Z

UPDATED
I don't have access to a Volta GPU, so I tested following configs on my RTX 3090:

'blit', 'geom', 'color', 'filter', 'noise', 'cutout', 'bg', 'bgc', 'bgcf', 'bgcfn', 'bgcfnc'
with --aug="fixed" --p=0.25

Timing for "while not done" loop ( iteration approx average ):

noaug - Iteration TIME: ~ 2.5 seconds
blit - Iteration TIME: ~ 15.0 seconds
geom - Iteration TIME: ~ 29.0 seconds
color - Iteration TIME: ~ 2.7 seconds
filter - Iteration TIME: ~ 2.9 seconds
noise - Iteration TIME: ~ 2.6 seconds
cutout - Iteration TIME: ~ 2.6 seconds
bg - Iteration TIME: ~ 28.0 seconds
bgc - Iteration TIME: ~ 29.0 seconds (with spikes up to 38 seconds)
bgcf - Iteration TIME: ~ 29.0 seconds
bgcfn - Iteration TIME: ~ 29.0 seconds
bgcfnc - Iteration TIME: ~ 29.0 seconds

In reality, when the aug is not fixed, it grows to 0.4 and gets those numbers to even higher values in seconds.
Some of those I noticed have spikes, I didn't wait for each one more than 3 iterations to see which have spikes as well though, don't think it would be meaningful.
So I noticed only blit and geom having issues (that's why I covered them in more details above).

nerdyrodent · 2020-11-20T16:40:11Z

Did some quick and dirty testing with the augpipeline_specs. I'd been using filter, noise and cutout without issue, so I used those as a base. I've also observed the GPU mem ctrl% drop below 10% when things are going slowly. Additionally, the card is typically audible when working as expected.

These appeared to have the largest performance impact, based only on GPU mem usage graphs:
xint=1
rotate=1
aniso=1
xfrac=1

Example aug pipeline specs tests:
'nr': dict(imgfilter=1, noise=1, cutout=1, xflip=1, rotate90=1, scale=1),
First tick was 8mins after tick 0, so still a bit slower than base. Some GPU memory use dips below 10%. Didn't test longer to see if it got slower.

This test:
'nr': dict(imgfilter=1, noise=1, cutout=1, xflip=1, rotate90=1),
First tick was 6mins after tick 0, memory load steady. Second tick also the same speed.
I'm still running this one, but I expect timings to remain steady.

JulianPinzaru · 2020-11-20T19:32:37Z

@nerdyrodent I also have a feeling that it is somehow overusing the hard storage (SSD) and not using RAM or VRAM, some bottleneck present there and it's not clear where. Maybe due to how it operates with CUDA_CACHE directory, not sure.

nerdyrodent · 2020-11-25T21:45:29Z

I've removed docker from the equation, and I'm still seeing the same behaviour. Using nvidia tensorflow r1.15.4+nv20.11 (via pip) + cuda_11.1.1_455.32.00

Another thing I've noticed is one CPU core will stick it out at 100% for a while, which it doesn't do when not using blit or geom. Now I'm really confused.

JulianPinzaru · 2020-12-03T18:36:14Z

@nurpax
Hello! Any updates? 🙂

JulianPinzaru · 2020-12-10T07:54:39Z

@nurpax
Is there any upcoming willingness to help us here ?
Not trying to rush you, just pinging to see if there is anyone reading over that issue and posts.

nurpax · 2020-12-10T09:30:47Z

@JulianPinzaru Hi! I'm following some of the posts (incl. this one) but alas, we don't have any new updates to this one. This problem does not actively impact our ongoing research projects and we're a small team of researchers with limited time.

I will try to update if we find something that may apply here -- but I also prefer not to post updates unless we have fairly high confidence that the ideas/fixes actually help.

Can you check the exact version of libcudnn that TensorFlow is using?

Reading through the comments: so there is a massive slow down and a memory leak when using any of the --augpipe b* variants? Almost looks like some operations are falling back to CPU when we'd expect them to run on the GPU.

sanmeow · 2020-12-11T15:04:40Z

I'm facing the same issue too, so only use with noaug for my RTX 3090.Wish this can be fixed.Now the only way is I use 2080ti for ada with different GPUs.

nurpax · 2020-12-18T17:41:57Z

A heads up on TensorFlow 1.x, RTX 3090 support and StyleGAN2-ADA. Our research group is in the process of switching to PyTorch and StyleGAN2 ADA will be our last project written in TensorFlow.

We have ported StyleGAN2 ADA to PyTorch and plan on releasing this new codebase as the official StyleGAN2 ADA PyTorch implementation. We hope to release the PyTorch port sometime in January 2021.

We expect the problems discussed in this GitHub issue to disappear as we transition to CUDA 11, cuDNN 8.0.x and the latest PyTorch release.

johndpope · 2020-12-18T21:00:00Z

thanks for the heads up. if there was an alpha branch - unsupported - it would be super. In the mean time - have to piece things together using this repo https://github.com/rosinality/stylegan2-pytorch

UPDATE - pytorch yet to release cuda11.1 binaries
pytorch/pytorch#45021

might have to swap in old GPU to get some work done.

UPDATE2 - the tensorflow docker container just slows down irrspective of neural net stuff.
I made a PR to latest docker container / current one is a few months old - #51

JulianPinzaru · 2020-12-20T00:21:23Z

A heads up on TensorFlow 1.x, RTX 3090 support and StyleGAN2-ADA. Our research group is in the process of switching to PyTorch and StyleGAN2 ADA will be our last project written in TensorFlow.

We have ported StyleGAN2 ADA to PyTorch and plan on releasing this new codebase as the official StyleGAN2 ADA PyTorch implementation. We hope to release the PyTorch port sometime in January 2021.

We expect the problems discussed in this GitHub issue to disappear as we transition to CUDA 11, cuDNN 8.0.x and the latest PyTorch release.

Salivating to see that one on pytorch! :) I bet it is a future proof decision, as tf 1.15 is not maintained anymore and there is a big pytorch community, willing to get their hands on 3000 series as well.
Thanks for heads up!

BartWMK · 2020-12-24T22:16:20Z

I do not expect the pytorch cuda11/cuDNN 8.0x version to resolve what seems rtx3000 series driver issue

Im using cuda 11/8.0x (most recent TF1/C11 docker from nvidia, with updated lastest release of cudnn)

The problem is in the use of 2 tf functions in augment.py:
tf.nn.depthwise_conv2d_backprop_input
and
tf.nn.depthwise_conv2d
in this block:
# Execute geometric transformations.

When disabling these 4 filtering up/downscale invocations not only removes the dramatic performance impact but also removes a approx (res=512) 1gb/h memory leak seemingly triggered by the code behind these operations.

As said, I noticed same behavior on a pytorch implementation : (lucidrains/stylegan2-pytorch, although I did not research that up to offending operation level) so it seems to be pointing towards cudnn and/or driver issue. Seemingly, other peeps are seeing the same, also with pytorch: https://github.com/pytorch/pytorch/issues/47039

For those in the mood for a short-term workaround: replace the relevant calls with a less fancy (unfiltered?) scaling not using depthwise convolutions.

Do note that in general, depthwise convolutions dont scale well ( Gholami et al. , https://arxiv.org/pdf/1803.10615.pdf ) so not expecting miracles, but the current performance penalty and memory leak seem a bit excessive. 2080ti level should be possible on 3090.

johndpope · 2020-12-25T00:05:59Z

Hi Bart, feel free to throw up whatever code you have as a gist (might help other people troubleshooting) - https://gist.github.com/BartWMK - then can switch in your sample and see things more clearly.

UPDATE - supposedly it’s possible to get 3090 working without docker + ubuntu. (hit a wall with zsh - doesn't correctly find tensorflow packages / just use bash) I recommend using timeshift to snapshot/backup your working system before doing any brain surgery + POPOS to get nvidia drivers up and running out of the box.

From @dbkinghorn
https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/

But I get same error - alue 'sm_86' is not defined for option 'gpu-architecture' / can anyone get working locally without docker?
dbkinghorn/NGC-TF1-nvidia-examples#1

side note - found this stylegan2 pytorch code by @GreenLimeSia repo (seems pretty polished /it's a port function by function with documenting code / already handles all the tensorflow1 - pkl migrations
https://github.com/GreenLimeSia/GenEdi/tree/master/stylegan2

(seems feature complete)
**stylegan2**
    __init__.py
    loss_fns.py
    models.py
    modules.py
    project.py
    train.py
    utils.py

**run_convert_from_tf.py**
run_gui_interactive_local.py

I got this to work using pytorch cuda 11.0 (even though 11.1 not released yet)

Tensorflow2 + stylegan2
it seems @k-l-lambda has solved compatibility problems with tensorflow2 + stylegan2 using existing code from this repo - https://github.com/johndpope/stylegan-web

a lot of the code gets around compatibility problems by running in compatibility mode - NVidia - this decoractive code to get to tensorflow2 would help out a lot more than pytorch port - there's so many libraries hanging off this stylegan2-ada repo.

import tensorflow.compat.v1 as tensorflow
tf = tensorflow
tf.disable_v2_behavior()

UPDATE - tensorflow 2 - still gets
nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'
https://github.com/johndpope/stylegan2-ada

python3 -c 'import tensorflow as tf; print(tf.__version__)' 
2.5.0-dev20201218

seems like nvidia have bumped the cuda toolkit on december 17th to 11.2
so maybe just getting latest toolkit will fix everything - https://developer.nvidia.com/cuda-downloads

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Allowed values for this option: 'compute_35','compute_37','compute_50',
'compute_52','compute_53','compute_60','compute_61','compute_62','compute_70',
'compute_72','compute_75','compute_80','lto_35','lto_37','lto_50','lto_52',
'lto_53','lto_60','lto_61','lto_62','lto_70','lto_72','lto_75','lto_80',
'sm_35','sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70',
'sm_72','sm_75','sm_80'.

| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1   
uname -r
5.8.0-7630-generic

k-l-lambda · 2020-12-28T05:30:04Z

@johndpope You mentioned me, and this is my key commit for tf2 compatibility: k-l-lambda/stylegan-web@6be1a4f

Hope it helps.

johndpope · 2020-12-28T13:02:14Z

success ! got it working without docker.
I removed 455 driver - rebooted
https://linuxconfig.org/how-to-uninstall-the-nvidia-drivers-on-ubuntu-20-04-focal-fossa-linux
then I installed the cuda toolkit 11.2 using the download from site
sudo sh cuda_11.2.0_460.27.04_linux.run

NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2

threw these into my ~/.zshrc file

export PATH=/usr/local/cuda-11.2/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:${LD_LIBRARY_PATH}

new terminal window sanity check

which nvcc 
/usr/local/cuda-11.2/bin/nvcc

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

this has tensorflow 2 fixes (compatibility mode)
https://github.com/johndpope/stylegan2-ada

(machine is running a bit slow / and chrome is unusually crashing - so beware)
I recommend using timeshift if you need to rewind config settings.

thanks again @k-l-lambda

mdvorsky · 2020-12-29T10:15:22Z

Upgrading the BASE_IMAGE in Dockerfile (#51) fixed the issue for me. The new 20.12 Docker image contains cuDNN 8.0.5 which according to the release notes contains significant performance improvements for RTX 3090. (The current image 20.10 uses cuDNN 8.0.4).

Emperornero · 2021-01-05T07:16:19Z

I'm using Windows and the disjointed nature of this issue thread is a bit hard to follow. From what I'm seeing, someone here has fixed the issues with the 3090, but I'm unsure what exactly had been done.

Does someone have a definitive fix for using the 30 series with this? I bought a 3090 specifically for custom StyleGANs, only to find I can't do them because of this compatibility issue.

johndpope · 2021-01-05T23:36:46Z

Use latest nvidia driver 460 + cuda 11.2 toolkit - https://developer.nvidia.com/cuda-downloads on host
Use the latest docker container (there's going to be a january release any day) - #51
If you want to run this codebase on 3090 directly on your machine - it won't work without tensorflow1.
There's no plan to support tensorflow2 - everything is moving to pytorch.

However - adding a few lines to add compatibility mode for tensorflow 2 does get it working.
It's frankly pretty miserable that Nvidia won't add this.
https://github.com/johndpope/stylegan2-ada/tree/main

// this line
import tensorflow as tf

// becomes this line
import tensorflow.compat.v1 as tensorflow
tf = tensorflow
tf.disable_v2_behavior()

more elaborate fork here
https://github.com/johndpope/stylegan2-ada/tree/digressions

JulianPinzaru · 2021-01-24T21:48:39Z

@nurpax hi ! Is there any chance to see sg-ada on pytorch any time soon? Thanks

nurpax · 2021-02-01T10:12:48Z

@JulianPinzaru YES!

We just published the repo, find your bits at: https://github.com/NVlabs/stylegan2-ada-pytorch

I haven't tested the code on RTX 3090 myself. Pretty sure it will require CUDA 11.1 to run and might break on CUDA 11.0. I will be looking into RTX 3090 support this week.

johndpope · 2021-03-08T05:39:22Z

Yes. Again, it’s not really a ‘conversion’ just running in compatibility mode. There is a way to convert code / but I didn’t go down this route.

johndpope · 2021-03-08T05:52:55Z

Sorry @Thunder003 / won’t be able to be much more help here. Tensorflow is kinda dead to me now.

AlirezaParchami · 2022-01-18T09:57:11Z

I have also an odd issue with StyleGan2 (official TensorFlow implementation) on RTX 3090 in Windows at the very first stage, running Generator for a test.
The GPU driver is updated, TensorFlow 1.14 is in use, and I have tested CUDA 11.1 and CUDA 11.5 with the code!
There is no error or exception, but run_generator.py takes around 30 min to generate 11 images and the images are generated totally noisy, using stylegan2-ffhq-config-f network (as you can see below).
I tested the same setting on GTX 960 and another cluster, which only took 2 minutes to generate and it worked well!

Is there any solution or fix for this issue?!

JulianPinzaru · 2022-01-21T08:12:31Z

Is there any solution or fix for this issue?!

I don't think you should use Tensorflow implementation. Just go for NVLabs Pytorch Stylegan2 (or 3). It works fine on 3000 series. It's also somewhat compatible with older TF trained network pkls (if I am not mistaken).

winssk · 2022-05-09T04:40:13Z

Is there any solution or fix for this issue?!

Hi! Is there any update or solution for this issue?

jannehellsten · 2022-05-09T06:19:22Z

The recommended fix is to switch to either https://github.com/NVlabs/stylegan3 or https://github.com/NVlabs/stylegan2-ada-pytorch both of which are known to work on new hardware and recent versions of PyTorch.

johndpope referenced this issue in NVlabs/stylegan2 Dec 14, 2020

Workaround for NCCL bug in TF 1.15

23f8bed

This was referenced Dec 21, 2020

Upgrade code to tf 2.x #52

Closed

Training slows after a dozen ticks #41

Open

This was referenced Dec 27, 2020

gpl license = evil - https://medium.com/@jcastromail/gpl-license-is-evil-2f71622399d6 lucidrains/stylegan2-pytorch#195

Closed

New encoder rosasalberto/StyleGAN2-TensorFlow-2.x#6

Closed

johndpope mentioned this issue Dec 29, 2020

Feature Suggestion - include latent_directions k-l-lambda/stylegan-web#6

Open

johndpope mentioned this issue Jan 9, 2021

Adding Docker support RameenAbdal/StyleFlow#8

Closed

nurpax mentioned this issue Jan 11, 2021

Learning the code question #67

Closed

jannehellsten closed this as completed May 9, 2022

rtx 3000 series broken compatibility #32

rtx 3000 series broken compatibility #32

Comments

JulianPinzaru commented Nov 10, 2020 • edited Loading

9of9 commented Nov 15, 2020

nurpax commented Nov 16, 2020

JulianPinzaru commented Nov 17, 2020

nurpax commented Nov 17, 2020

JulianPinzaru commented Nov 17, 2020 • edited Loading

JulianPinzaru commented Nov 18, 2020 • edited Loading

nerdyrodent commented Nov 18, 2020

JulianPinzaru commented Nov 18, 2020 • edited Loading

nerdyrodent commented Nov 18, 2020

nurpax commented Nov 19, 2020

nerdyrodent commented Nov 19, 2020 • edited Loading

JulianPinzaru commented Nov 20, 2020 • edited Loading

JulianPinzaru commented Nov 20, 2020 • edited Loading

nerdyrodent commented Nov 20, 2020

JulianPinzaru commented Nov 20, 2020

nerdyrodent commented Nov 25, 2020

JulianPinzaru commented Dec 3, 2020

JulianPinzaru commented Dec 10, 2020

nurpax commented Dec 10, 2020

sanmeow commented Dec 11, 2020

nurpax commented Dec 18, 2020

johndpope commented Dec 18, 2020 • edited Loading

JulianPinzaru commented Dec 20, 2020

BartWMK commented Dec 24, 2020

johndpope commented Dec 25, 2020 • edited Loading

k-l-lambda commented Dec 28, 2020

johndpope commented Dec 28, 2020 • edited Loading

mdvorsky commented Dec 29, 2020 • edited Loading

Emperornero commented Jan 5, 2021

johndpope commented Jan 5, 2021 • edited Loading

JulianPinzaru commented Jan 24, 2021

nurpax commented Feb 1, 2021 • edited Loading

johndpope commented Mar 8, 2021

johndpope commented Mar 8, 2021

AlirezaParchami commented Jan 18, 2022 • edited Loading

JulianPinzaru commented Jan 21, 2022 • edited Loading

winssk commented May 9, 2022 • edited Loading

jannehellsten commented May 9, 2022

JulianPinzaru commented Nov 10, 2020 •

edited

Loading

JulianPinzaru commented Nov 17, 2020 •

edited

Loading

JulianPinzaru commented Nov 18, 2020 •

edited

Loading

JulianPinzaru commented Nov 18, 2020 •

edited

Loading

nerdyrodent commented Nov 19, 2020 •

edited

Loading

JulianPinzaru commented Nov 20, 2020 •

edited

Loading

JulianPinzaru commented Nov 20, 2020 •

edited

Loading

johndpope commented Dec 18, 2020 •

edited

Loading

johndpope commented Dec 25, 2020 •

edited

Loading

johndpope commented Dec 28, 2020 •

edited

Loading

mdvorsky commented Dec 29, 2020 •

edited

Loading

johndpope commented Jan 5, 2021 •

edited

Loading

nurpax commented Feb 1, 2021 •

edited

Loading

AlirezaParchami commented Jan 18, 2022 •

edited

Loading

JulianPinzaru commented Jan 21, 2022 •

edited

Loading

winssk commented May 9, 2022 •

edited

Loading