Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during caffe::init #3788

Closed
dfotland opened this issue Mar 8, 2016 · 3 comments
Closed

Segfault during caffe::init #3788

dfotland opened this issue Mar 8, 2016 · 3 comments

Comments

@dfotland
Copy link

dfotland commented Mar 8, 2016

I'm using caffe-rc3 on Ubuntu. Caffe tests pass. mnist sample runs perfectly. I have a trained net with a net and weight files. Everything works perfectly in CPU mode. GPU crashes. I've spent a few hours with gdb and the crash happens when caffe_rng_uniform() calls caffe_rng() and rng_stream returns 0x1, a bad pointer.

16 inline rng_t* caffe_rng() {
17 return static_castcaffe::rng_t*(Caffe::rng_stream().generator());
18 }
1

random_generator pointer is 0x1, which causes the crash when it is dereferenced

(gdb) p *caffe::thread_instance_.get()
$49 = {cublas_handle_ = 0x4df9160, curand_generator_ = 0x4dfab10, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}

However, caffe Get() has a good pointer. it seems like the thread specific data and the singleton data are different. I can;t figure out why.

(gdb) p *caffe::Caffe::Get().random_generator_
$46 = (caffe::Caffe::RNG &) @0x4df9160: {generator_ = {px = 0x7fffffff00000200, pn = {pi_ = 0xffff0000ffff}}}

backtrace:
(gdb) bt
#0 caffe::caffe_rng () at ./include/caffe/util/rng.hpp:17
#1 0x00007ffff723f833 in caffe::caffe_rng_uniform (n=81536, a=-0.0686263517, b=0.0686263517, r=0x201200000)

at src/caffe/util/math_functions.cpp:252

#2 0x00007ffff716ae72 in caffe::XavierFiller::Fill (this=0x60a4fb0, blob=0x60a47f0) at ./include/caffe/filler.hpp:161
#3 0x00007ffff71f7d82 in caffe::BaseConvolutionLayer::LayerSetUp (this=0x60a0620,

bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/base_conv_layer.cpp:170

#4 0x00007ffff7195c33 in caffe::CuDNNConvolutionLayer::LayerSetUp (this=0x60a0620,

bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/cudnn_conv_layer.cpp:20

#5 0x00007ffff7155548 in caffe::Layer::SetUp (this=0x60a0620, bottom=std::vector of length 1, capacity 1 = {...},

top=std::vector of length 1, capacity 1 = {...}) at ./include/caffe/layer.hpp:71

#6 0x00007ffff7295246 in caffe::Net::Init (this=0x4e4a890, in_param=...) at src/caffe/net.cpp:148
#7 0x00007ffff72939e0 in caffe::Net::Net (this=0x4e4a890, param_file="/home/ubuntu/linux/gtpmfgo//golast19.prototxt",

phase=caffe::TEST, root_net=0x0) at src/caffe/net.cpp:36

#8 0x00000000004fb4c6 in caffe_init (path=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:63
#9 0x00000000004de674 in uct_init_all (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)

at ../src/uct.c:465

#10 0x000000000048cbbc in init_mfgo (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)

at ../src/G2init.c:112

#11 0x00000000004fa301 in main (argc=6, argv=0x7fffffffe5d8) at gtpmfgo.cpp:1545

My code invoking caffe (use_gpu is true:

int caffe_init(const char *path, int use_gpu) {

ifdef HAVE_CAFFE

    int argc = 2;
    char *fake_args[] = { "gtpmfgo", "ManyFaces" };
    char **argv = fake_args;
    GlobalInit(&argc, &argv);
    if (use_gpu) {
            Caffe::set_mode(Caffe::GPU);
            Caffe::SetDevice(0);
            Caffe::DeviceQuery();
    }
    else {
            Caffe::set_mode(Caffe::CPU);
    }

    if (caffe_test_net != NULL) delete caffe_test_net;
    string file_path = path;
    file_path += "/";
    caffe_test_net = new Net<float>(file_path + filename_net, TEST);
    caffe_test_net->CopyTrainedLayersFrom(file_path + filename_parameters);
@dfotland
Copy link
Author

dfotland commented Mar 9, 2016

I'm using Cuda_7.5

@dfotland
Copy link
Author

It appears that during Caffe::set_mode, the compiler is writing the mode_ into the random_generator_. gdb output: I have gdb 4.8.4.

(gdb) bt
#0 boost::detail::shared_count::~shared_count (this=0x7fffffffb868, in_chrg=)
at /usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#1 0x00007ffff7222d56 in boost::shared_ptrboost::detail::tss_cleanup_function::~shared_ptr (this=0x7fffffffb860,
in_chrg=) at /usr/include/boost/smart_ptr/shared_ptr.hpp:328
#2 0x00007ffff7222f69 in boost::thread_specific_ptrcaffe::Caffe::reset (
this=0x7ffff7bb9db0 caffe::thread_instance
, new_value=0x1173990) at /usr/include/boost/thread/tss.hpp:105
#3 0x00007ffff7221001 in caffe::Caffe::Get () at src/caffe/common.cpp:17
#4 0x00000000004fbe05 in caffe::Caffe::set_mode (mode=caffe::Caffe::GPU)
at /home/ubuntu/linux/caffe-rc3/include/caffe/common.hpp:148
#5 0x00000000004fb334 in caffe_init (path=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1)
at ../src/caffecnn.cpp:54
#6 0x00000000004de5b4 in uct_init_all (cwd=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", max_memory=965,
max_threads=64, use_gpu=1) at ../src/uct.c:465
#7 0x000000000048cafc in init_mfgo (cwd=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=64,
use_gpu=1) at ../src/G2init.c:112
#8 0x00000000004fa241 in main (argc=4, argv=0x7fffffffe5e8) at gtpmfgo.cpp:1545
(gdb) n
375 }
(gdb) s
boost::thread_specific_ptrcaffe::Caffe::reset (this=0x7ffff7bb9db0 caffe::thread_instance
, new_value=0x1173990)
at /usr/include/boost/thread/tss.hpp:107
107 }
(gdb) s
caffe::Caffe::Get () at src/caffe/common.cpp:19
19 return (thread_instance.get());
(gdb) p thread_instance
.get()
$34 = (caffe::Caffe *) 0x1173990
(gdb) d
(gdb) p thread_instance
.get()->random_generator_
$35 = {px = 0x0, pn = {pi_ = 0x0}}
(gdb) s
boost::thread_specific_ptrcaffe::Caffe::get (this=0x7ffff7bb9db0 caffe::thread_instance_)
at /usr/include/boost/thread/tss.hpp:84
84 return static_cast<T_>(detail::get_tss_data(this));
(gdb) p thread_instance_.get()->random_generator_
No symbol "thread_instance_" in current context.
(gdb) p caffe::thread_instance_.get()->random_generator_
$36 = {px = 0x0, pn = {pi_ = 0x0}}
(gdb) s
85 }
(gdb) s
caffe::Caffe::Get () at src/caffe/common.cpp:20
20 }
(gdb) s
caffe_init (path=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:60
60 string file_path = path;
(gdb) p caffe::thread_instance_.get()->random_generator_
$37 = {px = 0x1, pn = {pi_ = 0x0}}
(gdb) p *caffe::thread_instance_.get()
$38 = {cublas_handle_ = 0x4840a30, curand_generator_ = 0x53064e0, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}
(gdb) s
61 file_path += "/";
(gdb) p *caffe::thread_instance_.get()
$39 = {cublas_handle_ = 0x4840a30, curand_generator_ = 0x53064e0, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}
(gdb) info threads
Id Target Id Frame
3 Thread 0x7fffcf3ff700 (LWP 1757) "gtpmfgo" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
2 Thread 0x7fffd490d700 (LWP 1756) "gtpmfgo" 0x00007ffff613a12d in poll ()
at ../sysdeps/unix/syscall-template.S:81

  • 1 Thread 0x7ffff7fa5a40 (LWP 1752) "gtpmfgo" caffe_init (path=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/",
    use_gpu=1) at ../src/caffecnn.cpp:61
    (gdb) p caffe::Caffe::MODE_GPU
    There is no field named MODE_GPU
    (gdb) p caffe::Caffe::GPU
    $40 = caffe::Caffe::GPU
    (gdb) p/x caffe::Caffe::GPU
    $41 = 0x1

@dfotland
Copy link
Author

Found the problem. I had CPU_ONLY defined in my application header, so my application and the library had different definition of the Caffe class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant