Cassandra page faults under YCSB workloadc with extra JVM logging #490

tgrabiec · 2014-09-05T17:56:55Z

When these options are passed to the JVM:

-verbose:gc 
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The JVM page faults in about a minute from starting YCSB benchmark:

2014-09-05T17:53:59.527+0000: page fault outside application, addr: 0x000020000b2ae000
[registers]
RIP: 0x000000000045ba31 <???+4569649>
RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x8000000000000000  RBX: 0x000020000b2ae004  RCX: 0x0000000000000000  RDX: 0x0000000000000000
RSI: 0x0000000000000003  RDI: 0x000020000b2ab31c  RBP: 0x000020000b2ad010  R8:  0x0000000000000066
R9:  0x0000000000000000  R10: 0x00000000ffffffe2  R11: 0x0000000000000000  R12: 0x0000000000000007
R13: 0x0000000000000000  R14: 0x00000000ffffffe2  R15: 0x0000000000000000  RSP: 0x000020000b2ab2a0
Aborted

[backtrace]
0x00000000003291bf <???+3314111>
0x000000000032a2d3 <mmu::vm_fault(unsigned long, exception_frame*)+147>
0x0000000000389ff9 <page_fault+105>
0x0000000000388ee6 <???+3706598>

#0  0x00000000003fa912 in cli_hlt ()
    at /data/tgrabiec/src/osv/arch/x64/processor.hh:242
#1  halt_no_interrupts () at /data/tgrabiec/src/osv/arch/x64/arch.hh:49
#2  osv::halt () at /data/tgrabiec/src/osv/core/power.cc:36
#3  0x00000000002237a5 in abort (fmt=fmt@entry=0x6058ed "Aborted\n")
    at /data/tgrabiec/src/osv/runtime.cc:150
#4  0x00000000002237d0 in abort () at /data/tgrabiec/src/osv/runtime.cc:117
#5  0x00000000003291c0 in mmu::vm_sigsegv (addr=<optimized out>, 
    ef=0xffff8001099ce078) at /data/tgrabiec/src/osv/core/mmu.cc:1191
#6  0x000000000032a2d4 in mmu::vm_fault (addr=<optimized out>, 
    addr@entry=35184559448064, ef=ef@entry=0xffff8001099ce078)
    at /data/tgrabiec/src/osv/core/mmu.cc:1213
#7  0x0000000000389ffa in page_fault (ef=0xffff8001099ce078)
    at /data/tgrabiec/src/osv/arch/x64/mmu.cc:38
#8  <signal handler called>
#9  fmt_fp (f=0x20000b2ad2c0, y=0, w=3, p=7, fl=0, t=102)
    at /data/tgrabiec/src/osv/musl/src/stdio/vfprintf.c:291
#10 0x0000000000000000 in ?? ()

The text was updated successfully, but these errors were encountered:

gleb-cloudius · 2014-09-06T16:32:19Z

We saw the same crash with ifconfig. It looks like we corrupt floating
point state somehow.

On Fri, Sep 05, 2014 at 10:57:03AM -0700, Tomasz Grabiec wrote:

When these options are passed to the JVM:

-verbose:gc 
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The JVM page faults in about a minute from starting YCSB benchmark:

2014-09-05T17:53:59.527+0000: page fault outside application, addr: 0x000020000b2ae000
[registers]
RIP: 0x000000000045ba31 <???+4569649>
RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x8000000000000000  RBX: 0x000020000b2ae004  RCX: 0x0000000000000000  RDX: 0x0000000000000000
RSI: 0x0000000000000003  RDI: 0x000020000b2ab31c  RBP: 0x000020000b2ad010  R8:  0x0000000000000066
R9:  0x0000000000000000  R10: 0x00000000ffffffe2  R11: 0x0000000000000000  R12: 0x0000000000000007
R13: 0x0000000000000000  R14: 0x00000000ffffffe2  R15: 0x0000000000000000  RSP: 0x000020000b2ab2a0
Aborted

[backtrace]
0x00000000003291bf <???+3314111>
0x000000000032a2d3 <mmu::vm_fault(unsigned long, exception_frame*)+147>
0x0000000000389ff9 <page_fault+105>
0x0000000000388ee6 <???+3706598>

#0  0x00000000003fa912 in cli_hlt ()
    at /data/tgrabiec/src/osv/arch/x64/processor.hh:242
#1  halt_no_interrupts () at /data/tgrabiec/src/osv/arch/x64/arch.hh:49
#2  osv::halt () at /data/tgrabiec/src/osv/core/power.cc:36
#3  0x00000000002237a5 in abort (fmt=fmt@entry=0x6058ed "Aborted\n")
    at /data/tgrabiec/src/osv/runtime.cc:150
#4  0x00000000002237d0 in abort () at /data/tgrabiec/src/osv/runtime.cc:117
#5  0x00000000003291c0 in mmu::vm_sigsegv (addr=<optimized out>, 
    ef=0xffff8001099ce078) at /data/tgrabiec/src/osv/core/mmu.cc:1191
#6  0x000000000032a2d4 in mmu::vm_fault (addr=<optimized out>, 
    addr@entry=35184559448064, ef=ef@entry=0xffff8001099ce078)
    at /data/tgrabiec/src/osv/core/mmu.cc:1213
#7  0x0000000000389ffa in page_fault (ef=0xffff8001099ce078)
    at /data/tgrabiec/src/osv/arch/x64/mmu.cc:38
#8  <signal handler called>
#9  fmt_fp (f=0x20000b2ad2c0, y=0, w=3, p=7, fl=0, t=102)
    at /data/tgrabiec/src/osv/musl/src/stdio/vfprintf.c:291
#10 0x0000000000000000 in ?? ()

Reply to this email directly or view it on GitHub:
#490

        Gleb.

raphaelsc · 2014-09-08T19:33:36Z

@gleb-cloudius, is this issue fixed by 0e8d9b5?

gleb-cloudius · 2014-09-09T05:32:48Z

On Mon, Sep 08, 2014 at 12:33:43PM -0700, Raphael S.Carvalho wrote:

@gleb-cloudius, is this issue fixed by 0e8d9b5?

Yes, it should be.

        Gleb.

slivne · 2014-09-29T07:47:22Z

@tgrabiec @gleb-cloudius can we close this issue

gleb-cloudius · 2014-09-29T07:53:19Z

On Mon, Sep 29, 2014 at 12:47:30AM -0700, slivne wrote:

@tgrabiec @gleb-cloudius can we close this issue

yes

        Gleb.

slivne added bug high priority labels Sep 29, 2014

slivne added this to the release 0.14 milestone Sep 29, 2014

tzach closed this as completed Sep 29, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra page faults under YCSB workloadc with extra JVM logging #490

Cassandra page faults under YCSB workloadc with extra JVM logging #490

tgrabiec commented Sep 5, 2014

gleb-cloudius commented Sep 6, 2014

raphaelsc commented Sep 8, 2014

gleb-cloudius commented Sep 9, 2014

slivne commented Sep 29, 2014

gleb-cloudius commented Sep 29, 2014

Cassandra page faults under YCSB workloadc with extra JVM logging #490

Cassandra page faults under YCSB workloadc with extra JVM logging #490

Comments

tgrabiec commented Sep 5, 2014

gleb-cloudius commented Sep 6, 2014

raphaelsc commented Sep 8, 2014

gleb-cloudius commented Sep 9, 2014

slivne commented Sep 29, 2014

gleb-cloudius commented Sep 29, 2014