-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault during caffe::init #3788
Comments
I'm using Cuda_7.5 |
It appears that during Caffe::set_mode, the compiler is writing the mode_ into the random_generator_. gdb output: I have gdb 4.8.4. (gdb) bt
|
Found the problem. I had CPU_ONLY defined in my application header, so my application and the library had different definition of the Caffe class. |
I'm using caffe-rc3 on Ubuntu. Caffe tests pass. mnist sample runs perfectly. I have a trained net with a net and weight files. Everything works perfectly in CPU mode. GPU crashes. I've spent a few hours with gdb and the crash happens when caffe_rng_uniform() calls caffe_rng() and rng_stream returns 0x1, a bad pointer.
16 inline rng_t* caffe_rng() {
17 return static_castcaffe::rng_t*(Caffe::rng_stream().generator());
18 }
1
random_generator pointer is 0x1, which causes the crash when it is dereferenced
(gdb) p *caffe::thread_instance_.get()
$49 = {cublas_handle_ = 0x4df9160, curand_generator_ = 0x4dfab10, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}
However, caffe Get() has a good pointer. it seems like the thread specific data and the singleton data are different. I can;t figure out why.
(gdb) p *caffe::Caffe::Get().random_generator_
$46 = (caffe::Caffe::RNG &) @0x4df9160: {generator_ = {px = 0x7fffffff00000200, pn = {pi_ = 0xffff0000ffff}}}
backtrace:
(gdb) bt
#0 caffe::caffe_rng () at ./include/caffe/util/rng.hpp:17
#1 0x00007ffff723f833 in caffe::caffe_rng_uniform (n=81536, a=-0.0686263517, b=0.0686263517, r=0x201200000)
#2 0x00007ffff716ae72 in caffe::XavierFiller::Fill (this=0x60a4fb0, blob=0x60a47f0) at ./include/caffe/filler.hpp:161
#3 0x00007ffff71f7d82 in caffe::BaseConvolutionLayer::LayerSetUp (this=0x60a0620,
#4 0x00007ffff7195c33 in caffe::CuDNNConvolutionLayer::LayerSetUp (this=0x60a0620,
#5 0x00007ffff7155548 in caffe::Layer::SetUp (this=0x60a0620, bottom=std::vector of length 1, capacity 1 = {...},
#6 0x00007ffff7295246 in caffe::Net::Init (this=0x4e4a890, in_param=...) at src/caffe/net.cpp:148
#7 0x00007ffff72939e0 in caffe::Net::Net (this=0x4e4a890, param_file="/home/ubuntu/linux/gtpmfgo//golast19.prototxt",
#8 0x00000000004fb4c6 in caffe_init (path=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:63
#9 0x00000000004de674 in uct_init_all (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)
#10 0x000000000048cbbc in init_mfgo (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)
#11 0x00000000004fa301 in main (argc=6, argv=0x7fffffffe5d8) at gtpmfgo.cpp:1545
My code invoking caffe (use_gpu is true:
int caffe_init(const char *path, int use_gpu) {
ifdef HAVE_CAFFE
The text was updated successfully, but these errors were encountered: