condvar_wake_* optimization for no waiters #21

nyh · 2013-07-16T09:16:24Z

Suggestion made by Avi:

Currently condvar_wake_one()/all() take the internal mutex before checking if there's anybody to wake. This is inefficient in the interesting case where wake() is called many times on a frequent event that often nobody is waiting on.

wake() should check if the list is empty without the lock, and only take the lock when it isn't empty. Of course, this should be done correctly, with the correct memory barriers as needed.

nyh · 2013-07-18T14:29:02Z

Done. See commits 3509b19 and 0080ee6

Starting up Cassandra with debug memory allocator GPFs as follows: Breakpoint 1, abort () at ../../runtime.cc:85 85 { (gdb) bt #0 abort () at ../../runtime.cc:85 #1 0x0000000000375812 in osv::generate_signal (siginfo=..., ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:40 #2 0x000000000037587c in osv::handle_segmentation_fault (addr=addr@entry=18446708889768681440, ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:55 #3 0x00000000002fba02 in page_fault (ef=0xffffc0003ffe3008) at ../../core/mmu.cc:876 #4 <signal handler called> #5 dbg::realloc (v=v@entry=0xffffe00019b3e000, size=size@entry=16) at ../../core/mempool.cc:846 #6 0x000000000032654c in realloc (obj=0xffffe00019b3e000, size=16) at ../../core/mempool.cc:870 #7 0x0000100000627743 in ?? () #8 0x00002000001fe770 in ?? () #9 0x00002000001fe780 in ?? () #10 0x00002000001fe710 in ?? () #11 0x00002000001fe700 in ?? () #12 0xffffe000170e8000 in ?? () #13 0x0000000200000001 in ?? () #14 0x0000000000000020 in ?? () #15 0x00002000001ffe70 in ?? () #16 0xffffe000170e0004 in ?? () #17 0x000000000036f361 in strcpy (dest=0x100001087420 "", src=<optimized out>) at ../../libc/string/strcpy.c:8 #18 0x0000100000629b53 in ?? () #19 0xffffe00019b22000 in ?? () #20 0x0000000000000001 in ?? () #21 0x0000000000000000 in ?? () The problem was introduced in commit 1ea5672 ("memory: let the debug allocator mimic the standard allocator more closely") which forgot to convert realloc() to use 'pad_before'.

If a crashed OSv guest is restarted, ZFS mount causes a GPF in early startup: VFS: mounting zfs at /usr zfs: mounting osv/usr from device /dev/vblk1 Aborted GDB backtrace points finger at zfs_rmnode(): #0 processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212 #1 0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20 #2 0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95 #3 0x000000000021cda2 in abort () at ../../runtime.cc:86 #4 0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44 #5 0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55 #6 0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876 #7 <signal handler called> #8 0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611 #9 0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355 #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523 #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881 #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr") at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016 #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415 #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb) at ../../fs/vfs/vfs_mount.c:171 #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415 #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215 #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59 #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071 #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468 #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581 #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133 #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101 The problem is that ZFS tries to check if the znode is an attribute directory and trips over zp->z_vnode being NULL. However, as explained in commit b7ee91e ("zfs: port vop_lookup"), we don't even support extended attributes so drop the check completely for OSv.

Fix mincore() to deal with unmapped addresses like msync() does. This fixes a SIGSEGV in libunwind's access_mem() when leak detector is enabled: (gdb) bt #0 page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:871 #1 <signal handler called> #2 ContiguousSpace::block_start_const (this=<optimized out>, p=0x77d2f3968) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/oops/oop.inline.hpp:411 #3 0x00001000008ae16c in GenerationBlockStartClosure::do_space (this=0x2000001f9100, s=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:242 #4 0x00001000007f097c in DefNewGeneration::space_iterate (this=0xffffc0003fb68c00, blk=0x2000001f9100, usedOnly=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/defNewGeneration.cpp:480 #5 0x00001000008aca0e in Generation::block_start (this=<optimized out>, p=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:251 #6 0x0000100000b06d2f in os::print_location (st=st@entry=0x2000001f9560, x=32165017960, verbose=verbose@entry=false) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/os.cpp:868 #7 0x0000100000b11b5b in os::print_register_info (st=0x2000001f9560, context=0x2000001f9740) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:839 #8 0x0000100000c6cde8 in VMError::report (this=0x2000001f9610, st=st@entry=0x2000001f9560) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:551 #9 0x0000100000c6da3b in VMError::report_and_die (this=this@entry=0x2000001f9610) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:984 #10 0x0000100000b1109f in JVM_handle_linux_signal (sig=11, info=0x2000001f9bb8, ucVoid=0x2000001f9740, abort_if_unrecognized=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:528 #11 0x000000000039f242 in call_signal_handler (frame=0x2000001f9b10) at ../../arch/x64/signal.cc:69 #12 <signal handler called> #13 0x000000000057d721 in access_mem () #14 0x000000000057cb1d in dwarf_get () #15 0x000000000057ce51 in _ULx86_64_step () #16 0x00000000004315fd in backtrace (buffer=0x1ff9d80 <memory::alloc_tracker::remember(void*, int)::bt>, size=20) at ../../libc/misc/backtrace.cc:16 #17 0x00000000003b8d99 in memory::alloc_tracker::remember (this=0x1777ae0 <memory::tracker>, addr=0xffffc0004508de00, size=54) at ../../core/alloctracker.cc:59 #18 0x00000000003b0504 in memory::tracker_remember (addr=0xffffc0004508de00, size=54) at ../../core/mempool.cc:43 #19 0x00000000003b2152 in std_malloc (size=54) at ../../core/mempool.cc:723 #20 0x00000000003b259c in malloc (size=54) at ../../core/mempool.cc:856 #21 0x0000100001615e4c in JNU_GetStringPlatformChars (env=env@entry=0xffffc0003a4dc1d8, jstr=jstr@entry=0xffffc0004591b800, isCopy=isCopy@entry=0x0) at ../../../src/share/native/common/jni_util.c:801 #22 0x000010000161ada6 in Java_java_io_UnixFileSystem_getBooleanAttributes0 (env=0xffffc0003a4dc1d8, this=<optimized out>, file=<optimized out>) at ../../../src/solaris/native/java/io/UnixFileSystem_md.c:111 #23 0x000020000021ed8e in ?? () #24 0x00002000001faa58 in ?? () #25 0x00002000001faac0 in ?? () #26 0x00002000001faa50 in ?? () #27 0x0000000000000000 in ?? () Spotted by Avi Kivity.

nyh closed this as completed Jul 18, 2013

gleb-cloudius mentioned this issue Feb 7, 2014

elf loader access mmaped elf image with preemption disabled #196

Closed

asias mentioned this issue May 6, 2014

tomcat NIO connector fails #293

Closed

avikivity mentioned this issue Jul 17, 2014

Partial balloon copy failure #398

Open

raphaelsc mentioned this issue Aug 7, 2014

epoll_wait() causes aborts when socket is closed concurrently #424

Closed

vladzcloudius mentioned this issue Aug 31, 2014

net: vm_fault assert comming from route_cache::lookup() #480

Closed

nyh mentioned this issue May 28, 2015

Java image compiled on Fedora 22 doesn't work #635

Closed

benoit-canet mentioned this issue Feb 16, 2016

OSv crashing when using parallel netperf connections #728

Open

yuwang888 mentioned this issue Jan 2, 2018

reschedule_from_interrupt assert(sched::exception_depth <= 1) when run specjbb2015 #933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

condvar_wake_* optimization for no waiters #21

condvar_wake_* optimization for no waiters #21

nyh commented Jul 16, 2013

nyh commented Jul 18, 2013

condvar_wake_* optimization for no waiters #21

condvar_wake_* optimization for no waiters #21

Comments

nyh commented Jul 16, 2013

nyh commented Jul 18, 2013