Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSv crashing when using parallel netperf connections #728

Open
benoit-canet opened this issue Feb 16, 2016 · 0 comments
Open

OSv crashing when using parallel netperf connections #728

benoit-canet opened this issue Feb 16, 2016 · 0 comments

Comments

@benoit-canet
Copy link
Contributor

With fedora23 host and a0cf9fa

In host:
./src/netserver -D -d

OSv launch

[benoit@localhost osv]$ ./scripts/run.py  -d -v -e "/tools/netperf.so -H 192.168.77.5 -t TCP_STREAM -l 30 -- -m 1400 & /tools/netperf.so -H 192.168.77.5 -t TCP_STREAM -l 30 -- -m 1400  & /tools/netperf.so -H 192.168.77.5 -t TCP_STREAM -l 30 -- -m 1400 " -c 4 
OSv v0.24-67-ga0cf9fa
eth0: 192.168.122.15
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.77.5 (192.168) port 0 AF_INET
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.77.5 (192.168) port 0 AF_INET
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.77.5 (192.168) port 0 AF_INET
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x00000000006293bc <sched::wait_object<waitqueue>::arm()+64>
RFL: 0x0000000000010206  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000000  RBX: 0x0000000000000001  RCX: 0x00002000002ff630  RDX: 0x00002000002ff650
RSI: 0x00002000002ff630  RDI: 0x00002000002ff640  RBP: 0x00002000002ff520  R8:  0x00002000002ff62f
R9:  0x000000000000111c  R10: 0x0000000000000036  R11: 0x0000000000000001  R12: 0x0000000000000000
R13: 0x000000000000111c  R14: 0x0000000000000036  R15: 0x0000000000000001  RSP: 0x00002000002ff520
Aborted

[backtrace]
0x000000000022d8aa <abort(char const*, ...)+249>
0x00000000003c83c1 <???+3965889>
0x00000000003c8560 <mmu::vm_fault(unsigned long, exception_frame*)+350>
0x000000000048d3fd <page_fault+315>
0x000000000048c286 <???+4768390>
0x0000000000243ff4 <void sched::arm<sched::wait_object<waitqueue>, sched::wait_object<sched::timer>, sched::wait_object<signal_catcher> >(sched::wait_object<waitqueue>&, sched::wait_object<sched::timer>&, sched::wait_object<signal_catcher>&)+31>
0x0000000000243e43 <void sched::arm<sched::wait_object<net_channel>, sched::wait_object<waitqueue>, sched::wait_object<sched::timer>, sched::wait_object<signal_catcher> >(sched::wait_object<net_channel>&, sched::wait_object<waitqueue>&, sched::wait_object<sched::timer>&, sched::wait_object<signal_catcher>&)+58>
0x0000000000243ae2 <void sched::thread::do_wait_for<lockfree::mutex, sched::wait_object<net_channel>, sched::wait_object<waitqueue>, sched::wait_object<sched::timer>, sched::wait_object<signal_catcher> >(lockfree::mutex&, sched::wait_object<net_channel>&&, sched::wait_object<waitqueue>&&, sched::wait_object<sched::timer>&&, sched::wait_object<signal_catcher>&&)+84>
0x0000000000243830 <void sched::thread::wait_for<net_channel&, waitqueue&, sched::timer&, signal_catcher&>(lockfree::mutex&, net_channel&, waitqueue&, sched::timer&, signal_catcher&)+150>
0x00000000002435b6 <int sbwait_tmo<osv::clock::uptime>(socket*, sockbuf*, boost::optional<std::chrono::time_point<osv::clock::uptime, osv::clock::uptime::duration> >)+273>
0x0000000000254a9b <socket_file::poll_sync(pollfd&, boost::optional<std::chrono::time_point<osv::clock::uptime, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >)+391>
0x000000000060eef6 <???+6352630>
0x000000000060efe4 <poll+155>
0x0000000000613aee <select+961>
0x0000100000c0dfeb <???+12640235>
0x0000100000c0e1ef <???+12640751>
0x0000100000c2488a <???+12732554>
0x0000100000c27f87 <???+12746631>
0x0000100000c0b264 <???+12628580>
0x000000000063e81b <osv::application::run_main(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, char**)+713>
0x000000000063ea6f <osv::application::run_main()+329>
0x000000000063e51e <osv::application::main()+108>
0x000000000063e121 <???+6545697>
0x000000000063e146 <???+6545734>
0x000000000069a240 <???+6922816>
0x000000000069c7bb <???+6932411>
0x000000000044fc6b <std::function<void ()>::operator()() const+49>
0x00000000005c1f47 <sched::thread::main()+27>
0x00000000005be765 <thread_main_c+38>
0x000000000048d205 <???+4772357>

In GDB:

    (gdb) info thread
      Id   Target Id         Frame 
      4    Thread 4 (CPU#3 [halted ]) processor::cli_hlt () at arch/x64/processor.hh:248
      3    Thread 3 (CPU#2 [halted ]) processor::cli_hlt () at arch/x64/processor.hh:248
      2    Thread 2 (CPU#1 [halted ]) processor::cli_hlt () at arch/x64/processor.hh:248
    * 1    Thread 1 (CPU#0 [halted ]) processor::cli_hlt () at arch/x64/processor.hh:248
    (gdb) bt
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    #1  0x000000000048f784 in nmi (ef=0xffff80000209a068) at arch/x64/exceptions.cc:294
    #2  <signal handler called>
    #3  processor::sti_hlt () at arch/x64/processor.hh:252
    #4  0x00000000005c9ce3 in arch::wait_for_interrupt () at arch/x64/arch.hh:43
    #5  0x00000000005bf567 in sched::cpu::do_idle (this=0xffff800001b13040) at core/sched.cc:403
    #6  0x00000000005bf60d in sched::cpu::idle (this=0xffff800001b13040) at core/sched.cc:422
    #7  0x00000000005be9bb in sched::cpu::<lambda()>::operator()(void) const (__closure=0xffff800002095070) at core/sched.cc:164
    #8  0x00000000005c5d26 in std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/5.3.1/functional:1871
    #9  0x000000000044fc6c in std::function<void ()>::operator()() const (this=0xffff800002095070) at /usr/include/c++/5.3.1/functional:2271
    #10 0x00000000005c1f48 in sched::thread::main (this=0xffff800002095040) at core/sched.cc:1051
    #11 0x00000000005be766 in sched::thread_main_c (t=0xffff800002095040) at arch/x64/arch-switch.hh:164
    #12 0x000000000048d206 in thread_main () at arch/x64/entry.S:113
    (gdb) thread 2
    [Switching to thread 2 (Thread 2)]
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    248     }
    (gdb) bt
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    #1  0x0000000000209c0e in arch::halt_no_interrupts () at arch/x64/arch.hh:48
    #2  0x000000000049d6bb in osv::halt () at arch/x64/power.cc:24
    #3  0x000000000022d8d1 in abort (fmt=0xa0c17d "Aborted\n") at runtime.cc:132
    #4  0x000000000022d7b1 in abort () at runtime.cc:96
    #5  0x00000000003c83c2 in mmu::vm_sigsegv (addr=0, ef=0xffff800003fe9078) at core/mmu.cc:1315
    #6  0x00000000003c8561 in mmu::vm_fault (addr=0, ef=0xffff800003fe9078) at core/mmu.cc:1337
    #7  0x000000000048d3fe in page_fault (ef=0xffff800003fe9078) at arch/x64/mmu.cc:38
    #8  <signal handler called>
    #9  0x00000000006293bc in sched::wait_object<waitqueue>::arm (this=0x2000002ff640) at core/waitqueue.cc:24
    #10 0x0000000000243ff5 in sched::arm<sched::wait_object<waitqueue>, sched::wait_object<sched::timer>, sched::wait_object<signal_catcher> > (first=...) at include/osv/sched.hh:1088
    #11 0x0000000000243e44 in sched::arm<sched::wait_object<net_channel>, sched::wait_object<waitqueue>, sched::wait_object<sched::timer>, sched::wait_object<signal_catcher> > (first=...)
        at include/osv/sched.hh:1089
    #12 0x0000000000243ae3 in sched::thread::do_wait_for<lockfree::mutex, sched::wait_object<net_channel>, sched::wait_object<waitqueue>, sched::wait_object<sched::timer>, sched::wait_object<signal_catcher> >(lockfree::mutex&, sched::wait_object<net_channel>&&, sched::wait_object<waitqueue>&&, sched::wait_object<sched::timer>&&, sched::wait_object<signal_catcher>&&) (mtx=...) at include/osv/sched.hh:1133
    #13 0x0000000000243831 in sched::thread::wait_for<net_channel&, waitqueue&, sched::timer&, signal_catcher&> (mtx=...) at include/osv/sched.hh:1159
    #14 0x00000000002435b7 in sbwait_tmo<osv::clock::uptime> (so=0xffffa00003e58e00, sb=0xffffa00003e58e90, timeout=...) at bsd/sys/kern/uipc_sockbuf.cc:154
    #15 0x0000000000254a9c in socket_file::poll_sync (this=0xffffa00003e57400, pfd=..., timeout=...) at bsd/sys/kern/sys_socket.cc:287
    #16 0x000000000060eef7 in poll_one (pfd=..., timeout=...) at core/poll.cc:343
    #17 0x000000000060efe5 in poll (_pfd=0xffff800004276040, _nfds=1, _timeout=120000) at core/poll.cc:360
    #18 0x0000000000613aef in select (nfds=1024, readfds=0x2000002ff990, writefds=0x0, exceptfds=0x0, timeout=0x2000002ff980) at core/select.cc:110
    #19 0x0000100000c0dfec in ?? ()
    #20 0x00002000002ff950 in ?? ()
    #21 0x0000000000000012 in ?? ()
    #22 0x0000000000000078 in ?? ()
    #23 0x0000000000000000 in ?? ()
    (gdb) tread 3
    Undefined command: "tread".  Try "help".
    (gdb) thread 3
    [Switching to thread 3 (Thread 3)]
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    248     }
    (gdb) bt
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    #1  0x000000000048f784 in nmi (ef=0xffff8000020f0068) at arch/x64/exceptions.cc:294
    #2  <signal handler called>
    #3  processor::sti_hlt () at arch/x64/processor.hh:252
    #4  0x00000000005c9ce3 in arch::wait_for_interrupt () at arch/x64/arch.hh:43
    #5  0x00000000005bf567 in sched::cpu::do_idle (this=0xffff800001b7f040) at core/sched.cc:403
    #6  0x00000000005bf60d in sched::cpu::idle (this=0xffff800001b7f040) at core/sched.cc:422
    #7  0x00000000005be9bb in sched::cpu::<lambda()>::operator()(void) const (__closure=0xffff8000020eb070) at core/sched.cc:164
    #8  0x00000000005c5d26 in std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/5.3.1/functional:1871
    #9  0x000000000044fc6c in std::function<void ()>::operator()() const (this=0xffff8000020eb070) at /usr/include/c++/5.3.1/functional:2271
    #10 0x00000000005c1f48 in sched::thread::main (this=0xffff8000020eb040) at core/sched.cc:1051
    #11 0x00000000005be766 in sched::thread_main_c (t=0xffff8000020eb040) at arch/x64/arch-switch.hh:164
    #12 0x000000000048d206 in thread_main () at arch/x64/entry.S:113
    (gdb) thread 4
    [Switching to thread 4 (Thread 4)]
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    248     }
    (gdb) bt
    #0  processor::cli_hlt () at arch/x64/processor.hh:248
    #1  0x000000000048f784 in nmi (ef=0xffff80000212b068) at arch/x64/exceptions.cc:294
    #2  <signal handler called>
    #3  processor::sti_hlt () at arch/x64/processor.hh:252
    #4  0x00000000005c9ce3 in arch::wait_for_interrupt () at arch/x64/arch.hh:43
    #5  0x00000000005bf567 in sched::cpu::do_idle (this=0xffff800001bb4040) at core/sched.cc:403
    #6  0x00000000005bf60d in sched::cpu::idle (this=0xffff800001bb4040) at core/sched.cc:422
    #7  0x00000000005be9bb in sched::cpu::<lambda()>::operator()(void) const (__closure=0xffff800002126070) at core/sched.cc:164
    #8  0x00000000005c5d26 in std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/5.3.1/functional:1871
    #9  0x000000000044fc6c in std::function<void ()>::operator()() const (this=0xffff800002126070) at /usr/include/c++/5.3.1/functional:2271
    #10 0x00000000005c1f48 in sched::thread::main (this=0xffff800002126040) at core/sched.cc:1051
    #11 0x00000000005be766 in sched::thread_main_c (t=0xffff800002126040) at arch/x64/arch-switch.hh:164
    #12 0x000000000048d206 in thread_main () at arch/x64/entry.S:113

Maybe per socket isolation is not well done or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant