-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check failed: error == cudaSuccess (77 vs. 0) #598
Comments
Hi @gaozunqi can you check the versions of your tools:
If you have those versions, can you create an issue on NVIDIA/Caffe with details on your network topology and system (GPU, etc.)? |
when I run 'dpkg -s libcudnn4' and 'dpkg -s caffe-nv', it says the computer has not installed the package(I don't know why,maybe I have not install it by *.deb???)
and my caffe-fork version is the master version |
@gheinrich hello,I have re-installed the digits and caffe nv by the doc in digits, it works! thank u. the accuracy has values,but the loss is also -nan. |
@lukeyeager hi,could u help me the problem above? |
Hi @gaozunqi what type of neural network are you training? Have you enabled mean image subtraction? |
I want to train the tast in "http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html",the Flickr Style data and the net in caffe official example. |
@gheinrich I also traind a mnist model, it's OK, no problem happens. |
@gaozunqi I notice this example is using a lower initial learning rate (0.001) than the default in DIGITS (0.01). You might want to try that learning rate, or even lower if the loss keeps diverging. |
@gheinrich Yeah! you are ritght! Thanks |
During learning, at the end of back propagation you will know the direction in which you need to move in order to reduce the loss. However since you are doing batched learning you don't want to move exactly to the target, since that target might suit the particular batch you're learning on but not the others. So you want to make a small step in the right direction so that after learning from many batches you will get closer to a solution that fits the entire dataset. The learning rate is a measure of how large a step you're making. If the learning rate is too high then you will make large steps which might get you further from the optimal target. This is similar to playing golf: if you're close to the hole but push the ball too hard, the ball might end up further from the hole than it was initially. If you keep pushing too hard you will end up infinitely far away from the hole. |
See http://caffe.berkeleyvision.org/tutorial/solver.html for information on the different types of solvers in Caffe. |
@gheinrich thanks for your patient. I have understand the theory u said in the paragraph 2,3,and 4, but still don't know why the loss = -nan. |
If the loss keeps increasing because you're using too high a learning rate, eventually it will become larger than any number than can be represented by a |
@gheinrich OK,your explanation is good.Thank u! : ) |
everything is OK, but this happend...(the message below is copied from the 'caffe_output.log' file)
I0224 18:20:47.449920 6492 solver.cpp:314] Iteration 0, Testing net (#0)
F0224 18:20:47.715102 6492 cudnn_conv_layer.cu:56] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
@ 0x7f18f808cea4 (unknown)
@ 0x7f18f808cdeb (unknown)
@ 0x7f18f808c7bf (unknown)
@ 0x7f18f808fa35 (unknown)
@ 0x7f18f8a1c119 caffe::CuDNNConvolutionLayer<>::Forward_gpu()
@ 0x7f18f88fc002 caffe::Net<>::ForwardFromTo()
@ 0x7f18f88fc127 caffe::Net<>::ForwardPrefilled()
@ 0x7f18f89fa923 caffe::Solver<>::Test()
@ 0x7f18f89fb0a6 caffe::Solver<>::TestAll()
@ 0x7f18f8a0319f caffe::Solver<>::Step()
@ 0x7f18f8a03e7e caffe::Solver<>::Solve()
@ 0x408602 train()
@ 0x4052eb main
@ 0x7f18f758ca40 (unknown)
@ 0x4059b9 _start
@ (nil) (unknown)
The text was updated successfully, but these errors were encountered: