Contention on vma_list_mutex during tomcat benchmark #310

tgrabiec · 2014-05-20T11:10:51Z

I can see some slight contention on vma_list_mutex during tomcat benchmark:

trace perf-lock over 3 second period:

698.74 ms (100.00%, #4307) All
 lockfree::mutex::lock():87
 |-- 636.25 ms (91.06%, #867) lock_guard:414
 |    |-- 618.06 ms (88.45%, #508) lock_guard_for_with_lock:75
 |    |    |-- 487.12 ms (69.71%, #101) mmu::vm_fault(unsigned long, exception_frame*):1159
 |    |    |    page_fault:36
 |    |    |    ex_pf:87
 |    |    |    |-- 233.90 ms (33.47%, #52) 0x10000103720c

Trapping with GDB shows it comes from places like:

#1  0x0000000000368f05 in page_fault (ef=0xffffc0000d741038) at /data/tgrabiec/osv/arch/x64/mmu.cc:36
#2  <signal handler called>
#3  0x0000100000f7579a in Monitor::wait(bool, long, bool) ()
#4  0x000010000112eebc in VMThread::execute(VM_Operation*) ()
#5  0x0000100000fcb2f1 in ParallelScavengeHeap::mem_allocate(unsigned long, bool*) ()
#6  0x00001000010f6ee4 in typeArrayKlass::allocate_common(int, bool, Thread*) ()
#7  0x000010000103732a in OptoRuntime::new_array_C(klassOopDesc*, int, JavaThread*) ()

Which comes from code which triggers a minor collection.

The text was updated successfully, but these errors were encountered:

gleb-cloudius · 2014-05-20T11:26:59Z

On Tue, May 20, 2014 at 04:10:58AM -0700, Tomasz Grabiec wrote:

I can see some slight contention on vma_list_mutex during tomcat benchmark:

trace perf-lock over 3 second period:

698.74 ms (100.00%, #4307) All
 lockfree::mutex::lock():87
 |-- 636.25 ms (91.06%, #867) lock_guard:414
 |    |-- 618.06 ms (88.45%, #508) lock_guard_for_with_lock:75
 |    |    |-- 487.12 ms (69.71%, #101) mmu::vm_fault(unsigned long, exception_frame*):1159
 |    |    |    page_fault:36
 |    |    |    ex_pf:87
 |    |    |    |-- 233.90 ms (33.47%, #52) 0x10000103720c

Trapping with GDB shows it comes from places like:

#1  0x0000000000368f05 in page_fault (ef=0xffffc0000d741038) at /data/tgrabiec/osv/arch/x64/mmu.cc:36
#2  <signal handler called>
#3  0x0000100000f7579a in Monitor::wait(bool, long, bool) ()
#4  0x000010000112eebc in VMThread::execute(VM_Operation*) ()
#5  0x0000100000fcb2f1 in ParallelScavengeHeap::mem_allocate(unsigned long, bool*) ()
#6  0x00001000010f6ee4 in typeArrayKlass::allocate_common(int, bool, Thread*) ()
#7  0x000010000103732a in OptoRuntime::new_array_C(klassOopDesc*, int, JavaThread*) ()

Which comes from code which triggers a minor collection.

Probably the same one that Pekka sees with cassandra. Try to add
-XX:+UseMembar and see if it helps.

        Gleb.

tgrabiec · 2014-05-20T12:53:35Z

That seems to reduce contention significantly, prof-lock in 10 second period:

 |    |    |-- 1.80 ms (0.95%, #7) mmu::vm_fault(unsigned long, exception_frame*)
 |    |    |    page_fault
 |    |    |    ex_pf

From:

#0  mmu::vm_fault (addr=<optimized out>, addr@entry=32440322752, ef=ef@entry=0xffffc0003b2f4038)
    at /data/tgrabiec/osv/core/mmu.cc:1160
#1  0x000000000036926a in page_fault (ef=0xffffc0003b2f4038) at /data/tgrabiec/osv/arch/x64/mmu.cc:37
#2  <signal handler called>
#3  0x0000100001012de9 in oopDesc* PSPromotionManager::copy_to_survivor_space<false>(oopDesc*) ()
#4  0x0000100001012458 in PSPromotionManager::drain_stacks_depth(bool) ()
#5  0x0000100000baa2e4 in CardTableExtension::scavenge_contents_parallel(ObjectStartArray*, MutableSpace*, HeapWord*, PSPromotionManager*, unsigned int, unsigned int) ()
#6  0x0000100001015aeb in OldToYoungRootsTask::do_it(GCTaskManager*, unsigned int) ()
#7  0x0000100000d31c02 in GCTaskThread::run() ()
#8  0x0000100000fa9a02 in java_start(Thread*) ()

penberg · 2014-05-20T12:56:05Z

IIRC, @elcallio mentioned at some point that UseMembar has its own set of problems. Calle?

glommer · 2014-05-20T12:56:28Z

On Tue, May 20, 2014 at 4:53 PM, Tomasz Grabiec [email protected]:

That seems to reduce contention significantly, prof-lock in 10 second
period:

| | |-- 1.80 ms (0.95%, #7) mmu::vm_fault(unsigned long, exception_frame*)
| | | page_fault
| | | ex_pf

From:

#0 mmu::vm_fault (addr=, addr@entry=32440322752, ef=ef@entry=0xffffc0003b2f4038)
at /data/tgrabiec/osv/core/mmu.cc:1160
#1 0x000000000036926a in page_fault (ef=0xffffc0003b2f4038) at /data/tgrabiec/osv/arch/x64/mmu.cc:37
#2
#3 0x0000100001012de9 in oopDesc* PSPromotionManager::copy_to_survivor_space(oopDesc_) ()
#4 0x0000100001012458 in PSPromotionManager::drain_stacks_depth(bool) ()
#5 0x0000100000baa2e4 in CardTableExtension::scavenge_contents_parallel(ObjectStartArray_, MutableSpace_, HeapWord_, PSPromotionManager_, unsigned int, unsigned int) ()
#6 0x0000100001015aeb in OldToYoungRootsTask::do_it(GCTaskManager_, unsigned int) ()
#7 0x0000100000d31c02 in GCTaskThread::run() ()
#8 0x0000100000fa9a02 in java_start(Thread*) ()

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/310#issuecomment-43621142
.

Since we control which options we pass to the JVM, I wonder what is the
effect of always passing this on? It is certainly a lot easier
than fighting our vma list lock.

gleb-cloudius · 2014-05-20T12:58:24Z

On Tue, May 20, 2014 at 05:56:36AM -0700, Glauber Costa wrote:

On Tue, May 20, 2014 at 4:53 PM, Tomasz Grabiec [email protected]:

That seems to reduce contention significantly, prof-lock in 10 second
period:

| | |-- 1.80 ms (0.95%, #7) mmu::vm_fault(unsigned long, exception_frame*)
| | | page_fault
| | | ex_pf

From:

#0 mmu::vm_fault (addr=, addr@entry=32440322752, ef=ef@entry=0xffffc0003b2f4038)
at /data/tgrabiec/osv/core/mmu.cc:1160
#1 0x000000000036926a in page_fault (ef=0xffffc0003b2f4038) at /data/tgrabiec/osv/arch/x64/mmu.cc:37
#2
#3 0x0000100001012de9 in oopDesc* PSPromotionManager::copy_to_survivor_space(oopDesc_) ()
#4 0x0000100001012458 in PSPromotionManager::drain_stacks_depth(bool) ()
#5 0x0000100000baa2e4 in CardTableExtension::scavenge_contents_parallel(ObjectStartArray_, MutableSpace_, HeapWord_, PSPromotionManager_, unsigned int, unsigned int) ()
#6 0x0000100001015aeb in OldToYoungRootsTask::do_it(GCTaskManager_, unsigned int) ()
#7 0x0000100000d31c02 in GCTaskThread::run() ()
#8 0x0000100000fa9a02 in java_start(Thread*) ()

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/310#issuecomment-43621142
.

Since we control which options we pass to the JVM, I wonder what is the
effect of always passing this on? It is certainly a lot easier
than fighting our vma list lock.

I have a patchset that drops vma_list_lock from sigsegv path. I asked to
check UseMembar just to verify that this is the same problem.

        Gleb.

elcallio · 2014-05-20T13:47:22Z

So, the UseMembar flag affects how the HS JVM does thread state transitions. Normally, with the flag off, each transition will do a "psuedo-membar" by writing to a dedicated memory page, which, when stopping threads, is temporarily protected to ensure that stopper/stoppee has the same (or close enough) idea of the threads state.
UseMembar will do exactly what it says; instead, each transition will cause a membar equivalent (in the x86 case a locked add), which normally should be way more expensive, simply because the number of transitions that are not happening while threads are being stopped should be in overwhelming majority. (Note that on a single CPU system the fence is ignored however, so there it might be a good option...)

If you are seeing page faults in the transitions in a really significant amount it would seem to be an indicator that you are either GC:ing too much, or have some very bad behaviour with biased locks (which will need to stop other threads in various contended cases). That you get better results on a multi-core system with fences enabled instead is somewhat scary.

gleb-cloudius · 2014-05-21T13:09:12Z

On Tue, May 20, 2014 at 06:47:30AM -0700, Calle Wilund wrote:

So, the UseMembar flag affects how the HS JVM does thread state transitions. Normally, with the flag off, each transition will do a "psuedo-membar" by writing to a dedicated memory page, which, when stopping threads, is temporarily protected to ensure that stopper/stoppee has the same (or close enough) idea of the threads state.
UseMembar will do exactly what it says; instead, each transition will cause a membar equivalent (in the x86 case a locked add), which normally should be way more expensive, simply because the number of transitions that are not happening while threads are being stopped should be in overwhelming majority. (Note that on a single CPU system the fence is ignored however, so there it might be a good option...)

If you are seeing page faults in the transitions in a really significant amount it would seem to be an indicator that you are either GC:ing too much, or have some very bad behaviour with biased locks (which will need to stop other threads in various contended cases). That you get better results on a multi-core system with fences enabled instead is somewhat scary.

From what I see there is not a lot of sigsegvs, but because sigsegv path takes
the same lock as #PF and #PF handling may sleep while holding the lock
(yeah, we need to fix that too), those that we do have are very slow.

        Gleb.

gleb-cloudius · 2014-05-25T13:07:34Z

Tomek, I think we can close this one.

        Gleb.

tgrabiec closed this as completed May 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contention on vma_list_mutex during tomcat benchmark #310

Contention on vma_list_mutex during tomcat benchmark #310

tgrabiec commented May 20, 2014

gleb-cloudius commented May 20, 2014

tgrabiec commented May 20, 2014

penberg commented May 20, 2014

glommer commented May 20, 2014

gleb-cloudius commented May 20, 2014

elcallio commented May 20, 2014

gleb-cloudius commented May 21, 2014

gleb-cloudius commented May 25, 2014

Contention on vma_list_mutex during tomcat benchmark #310

Contention on vma_list_mutex during tomcat benchmark #310

Comments

tgrabiec commented May 20, 2014

gleb-cloudius commented May 20, 2014

tgrabiec commented May 20, 2014

penberg commented May 20, 2014

glommer commented May 20, 2014

gleb-cloudius commented May 20, 2014

elcallio commented May 20, 2014

gleb-cloudius commented May 21, 2014

gleb-cloudius commented May 25, 2014