Add some spinning to mutex_lock() #25

nyh · 2013-07-24T11:17:29Z

When very short sections of code (e.g., set a couple of variables) are protected by mutex_lock()/mutex_unlock(), when the lock is busy it is often more efficient, instead of doing a context switch, just try locking again - the other thread will often finish already.

Some implementation ideas:

lockfree::mutex::lock() can, after increasing count, spin for a while while count is still > 1, and if during this short spin every time we get to count == 1, we try to take the lock (by taking the "handoff" left by the unlock(), which knows we're trying to get the lock because we increased count). And don't forget the "pause" instruction during that spin :-)
Probably a better alternative to the above is, instead of checking count==1 as above, check the handoff directly (see example in try_lock()) - without considering count. The reason is that two threads spinning in lock() will leave count==2 and neither of them will ever see count==1, but a handoff will occur, because unlock() sees count=2 and nobody on the queue.
A third alternative is not to use the handoff mechanism at all. I.e., like try_lock()'s code, don't increment count and only loop on a CAS trying to grab the lock when nobody has it. I'm not sure if this alternative is better or worse than the previous one. The loop is slower, but we don't force unlock() to use the handoff mechanism. This alternative also changes the way that lock() works in the uncontended case - previously it did fetch_add, and now we start it with compare_exchange (I don't know which is more efficient).
Don't forget NOT to spin on single CPU machines - it can't help :-)

An open question is for how long to spin before resorting to sleeping? Does the amount of spinning need to depend on number of processors, the lock's previous history, other heuristics, or be explicitly configured at compile-time by the programmer for each use? Some thoughts on this issue which were raised in a long email discussion we had:

My original thinking was that the programmer knows when critical sections are short, so he can use a different method, say lock_expect_quick(nspins) or lock_expect_quick(nanoseconds) in that case. The time of spin should be determined by the critical section's length, by the time it takes to do a context switch, or both (?).
Avi suggested to the above in a more OO manner, adding more types that keep mutex's original lock()/unlock() method names so that lock_guard<> and friends will continue to work. E.g., spinning_mutex. Both types of mutexes can share code using a template, e.g.,

typedef basic_mutex<contention_strategy_spin<50>> spinning_mutex;
typedef basic_mutex<contention_strategy_wait> io_mutex;
typedef basic_mutex<contention_strategy_adaptive> mutex;

Glauber suggested to look at the queue's length to see if other threads are waiting already, and if so, don't spin as we don't have much chance (assuming the spin length we chose is the amount of time it takes for one critical section to finish - not more than one). Getting the queue length is not easy, but we do have the "count" we can use (which, however, includes in addition to the queue length also the number of concurrent lock()s which didn't yet manage to put themselves on the queue), and we also know if the queue is empty.
Dor and Guy thought that spinning should be adaptive, i.e., consider how much time we spun in previous lock attempts and whether they were successful. E.g., if 100 spins were usually enough, don't spin more than that even if our maximum spin time (perhaps related to the context switch cost) is 1000 spins (i.e. previous experience showed that if after 100 spins we didn't get the lock, it won't help to wait for 1000 spins).
Dor and Guy thought that the spin time should be related to the number of vcpus - no spinning on one cpu (of course), spin for time x on 2 cpus, spin for x_1.5 or x_log2(4) on 4 vcpus, etc. Glauber and I thought the spin time should not depend on the number of CPUs.

Fix mincore() to deal with unmapped addresses like msync() does. This fixes a SIGSEGV in libunwind's access_mem() when leak detector is enabled: (gdb) bt #0 page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:871 #1 <signal handler called> #2 ContiguousSpace::block_start_const (this=<optimized out>, p=0x77d2f3968) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/oops/oop.inline.hpp:411 #3 0x00001000008ae16c in GenerationBlockStartClosure::do_space (this=0x2000001f9100, s=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:242 #4 0x00001000007f097c in DefNewGeneration::space_iterate (this=0xffffc0003fb68c00, blk=0x2000001f9100, usedOnly=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/defNewGeneration.cpp:480 #5 0x00001000008aca0e in Generation::block_start (this=<optimized out>, p=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:251 #6 0x0000100000b06d2f in os::print_location (st=st@entry=0x2000001f9560, x=32165017960, verbose=verbose@entry=false) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/os.cpp:868 #7 0x0000100000b11b5b in os::print_register_info (st=0x2000001f9560, context=0x2000001f9740) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:839 #8 0x0000100000c6cde8 in VMError::report (this=0x2000001f9610, st=st@entry=0x2000001f9560) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:551 #9 0x0000100000c6da3b in VMError::report_and_die (this=this@entry=0x2000001f9610) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:984 #10 0x0000100000b1109f in JVM_handle_linux_signal (sig=11, info=0x2000001f9bb8, ucVoid=0x2000001f9740, abort_if_unrecognized=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:528 #11 0x000000000039f242 in call_signal_handler (frame=0x2000001f9b10) at ../../arch/x64/signal.cc:69 #12 <signal handler called> #13 0x000000000057d721 in access_mem () #14 0x000000000057cb1d in dwarf_get () #15 0x000000000057ce51 in _ULx86_64_step () #16 0x00000000004315fd in backtrace (buffer=0x1ff9d80 <memory::alloc_tracker::remember(void*, int)::bt>, size=20) at ../../libc/misc/backtrace.cc:16 #17 0x00000000003b8d99 in memory::alloc_tracker::remember (this=0x1777ae0 <memory::tracker>, addr=0xffffc0004508de00, size=54) at ../../core/alloctracker.cc:59 #18 0x00000000003b0504 in memory::tracker_remember (addr=0xffffc0004508de00, size=54) at ../../core/mempool.cc:43 #19 0x00000000003b2152 in std_malloc (size=54) at ../../core/mempool.cc:723 #20 0x00000000003b259c in malloc (size=54) at ../../core/mempool.cc:856 #21 0x0000100001615e4c in JNU_GetStringPlatformChars (env=env@entry=0xffffc0003a4dc1d8, jstr=jstr@entry=0xffffc0004591b800, isCopy=isCopy@entry=0x0) at ../../../src/share/native/common/jni_util.c:801 #22 0x000010000161ada6 in Java_java_io_UnixFileSystem_getBooleanAttributes0 (env=0xffffc0003a4dc1d8, this=<optimized out>, file=<optimized out>) at ../../../src/solaris/native/java/io/UnixFileSystem_md.c:111 #23 0x000020000021ed8e in ?? () #24 0x00002000001faa58 in ?? () #25 0x00002000001faac0 in ?? () #26 0x00002000001faa50 in ?? () #27 0x0000000000000000 in ?? () Spotted by Avi Kivity.

slivne · 2014-07-31T09:11:33Z

@nyh can you please add the info we have collected on list to this issue for future reference ....

Once we submit/decide not to ... lets close this issue

gleb-cloudius mentioned this issue Feb 7, 2014

elf loader access mmaped elf image with preemption disabled #196

Closed

asias mentioned this issue May 6, 2014

tomcat NIO connector fails #293

Closed

avikivity mentioned this issue Jul 17, 2014

Partial balloon copy failure #398

Open

raphaelsc mentioned this issue Aug 7, 2014

epoll_wait() causes aborts when socket is closed concurrently #424

Closed

vladzcloudius mentioned this issue Aug 31, 2014

net: vm_fault assert comming from route_cache::lookup() #480

Closed

nyh mentioned this issue May 28, 2015

Java image compiled on Fedora 22 doesn't work #635

Closed

yuwang888 mentioned this issue Jan 2, 2018

reschedule_from_interrupt assert(sched::exception_depth <= 1) when run specjbb2015 #933

Open

wkozaczuk added the performance label Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some spinning to mutex_lock() #25

Add some spinning to mutex_lock() #25

nyh commented Jul 24, 2013

slivne commented Jul 31, 2014

Add some spinning to mutex_lock() #25

Add some spinning to mutex_lock() #25

Comments

nyh commented Jul 24, 2013

slivne commented Jul 31, 2014