-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved poll and efficient epoll #17
Comments
penberg
pushed a commit
that referenced
this issue
Aug 16, 2013
Starting up Cassandra with debug memory allocator GPFs as follows: Breakpoint 1, abort () at ../../runtime.cc:85 85 { (gdb) bt #0 abort () at ../../runtime.cc:85 #1 0x0000000000375812 in osv::generate_signal (siginfo=..., ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:40 #2 0x000000000037587c in osv::handle_segmentation_fault (addr=addr@entry=18446708889768681440, ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:55 #3 0x00000000002fba02 in page_fault (ef=0xffffc0003ffe3008) at ../../core/mmu.cc:876 #4 <signal handler called> #5 dbg::realloc (v=v@entry=0xffffe00019b3e000, size=size@entry=16) at ../../core/mempool.cc:846 #6 0x000000000032654c in realloc (obj=0xffffe00019b3e000, size=16) at ../../core/mempool.cc:870 #7 0x0000100000627743 in ?? () #8 0x00002000001fe770 in ?? () #9 0x00002000001fe780 in ?? () #10 0x00002000001fe710 in ?? () #11 0x00002000001fe700 in ?? () #12 0xffffe000170e8000 in ?? () #13 0x0000000200000001 in ?? () #14 0x0000000000000020 in ?? () #15 0x00002000001ffe70 in ?? () #16 0xffffe000170e0004 in ?? () #17 0x000000000036f361 in strcpy (dest=0x100001087420 "", src=<optimized out>) at ../../libc/string/strcpy.c:8 #18 0x0000100000629b53 in ?? () #19 0xffffe00019b22000 in ?? () #20 0x0000000000000001 in ?? () #21 0x0000000000000000 in ?? () The problem was introduced in commit 1ea5672 ("memory: let the debug allocator mimic the standard allocator more closely") which forgot to convert realloc() to use 'pad_before'.
penberg
pushed a commit
that referenced
this issue
Aug 26, 2013
If a crashed OSv guest is restarted, ZFS mount causes a GPF in early startup: VFS: mounting zfs at /usr zfs: mounting osv/usr from device /dev/vblk1 Aborted GDB backtrace points finger at zfs_rmnode(): #0 processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212 #1 0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20 #2 0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95 #3 0x000000000021cda2 in abort () at ../../runtime.cc:86 #4 0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44 #5 0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55 #6 0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876 #7 <signal handler called> #8 0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611 #9 0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355 #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523 #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881 #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr") at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016 #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415 #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb) at ../../fs/vfs/vfs_mount.c:171 #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415 #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215 #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59 #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071 #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468 #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581 #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133 #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101 The problem is that ZFS tries to check if the znode is an attribute directory and trips over zp->z_vnode being NULL. However, as explained in commit b7ee91e ("zfs: port vop_lookup"), we don't even support extended attributes so drop the check completely for OSv.
penberg
pushed a commit
that referenced
this issue
Aug 26, 2013
If a crashed OSv guest is restarted, ZFS mount causes a GPF in early startup: VFS: mounting zfs at /usr zfs: mounting osv/usr from device /dev/vblk1 Aborted GDB backtrace points finger at zfs_rmnode(): #0 processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212 #1 0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20 #2 0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95 #3 0x000000000021cda2 in abort () at ../../runtime.cc:86 #4 0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44 #5 0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55 #6 0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876 #7 <signal handler called> #8 0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611 #9 0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355 #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523 #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881 #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr") at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016 #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415 #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb) at ../../fs/vfs/vfs_mount.c:171 #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415 #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215 #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59 #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071 #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468 #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581 #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133 #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101 The problem is that ZFS tries to check if the znode is an attribute directory and trips over zp->z_vnode being NULL. However, as explained in commit b7ee91e ("zfs: port vop_lookup"), we don't even support extended attributes so drop the check completely for OSv.
penberg
pushed a commit
that referenced
this issue
Aug 26, 2013
Fix mincore() to deal with unmapped addresses like msync() does. This fixes a SIGSEGV in libunwind's access_mem() when leak detector is enabled: (gdb) bt #0 page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:871 #1 <signal handler called> #2 ContiguousSpace::block_start_const (this=<optimized out>, p=0x77d2f3968) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/oops/oop.inline.hpp:411 #3 0x00001000008ae16c in GenerationBlockStartClosure::do_space (this=0x2000001f9100, s=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:242 #4 0x00001000007f097c in DefNewGeneration::space_iterate (this=0xffffc0003fb68c00, blk=0x2000001f9100, usedOnly=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/defNewGeneration.cpp:480 #5 0x00001000008aca0e in Generation::block_start (this=<optimized out>, p=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:251 #6 0x0000100000b06d2f in os::print_location (st=st@entry=0x2000001f9560, x=32165017960, verbose=verbose@entry=false) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/os.cpp:868 #7 0x0000100000b11b5b in os::print_register_info (st=0x2000001f9560, context=0x2000001f9740) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:839 #8 0x0000100000c6cde8 in VMError::report (this=0x2000001f9610, st=st@entry=0x2000001f9560) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:551 #9 0x0000100000c6da3b in VMError::report_and_die (this=this@entry=0x2000001f9610) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:984 #10 0x0000100000b1109f in JVM_handle_linux_signal (sig=11, info=0x2000001f9bb8, ucVoid=0x2000001f9740, abort_if_unrecognized=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:528 #11 0x000000000039f242 in call_signal_handler (frame=0x2000001f9b10) at ../../arch/x64/signal.cc:69 #12 <signal handler called> #13 0x000000000057d721 in access_mem () #14 0x000000000057cb1d in dwarf_get () #15 0x000000000057ce51 in _ULx86_64_step () #16 0x00000000004315fd in backtrace (buffer=0x1ff9d80 <memory::alloc_tracker::remember(void*, int)::bt>, size=20) at ../../libc/misc/backtrace.cc:16 #17 0x00000000003b8d99 in memory::alloc_tracker::remember (this=0x1777ae0 <memory::tracker>, addr=0xffffc0004508de00, size=54) at ../../core/alloctracker.cc:59 #18 0x00000000003b0504 in memory::tracker_remember (addr=0xffffc0004508de00, size=54) at ../../core/mempool.cc:43 #19 0x00000000003b2152 in std_malloc (size=54) at ../../core/mempool.cc:723 #20 0x00000000003b259c in malloc (size=54) at ../../core/mempool.cc:856 #21 0x0000100001615e4c in JNU_GetStringPlatformChars (env=env@entry=0xffffc0003a4dc1d8, jstr=jstr@entry=0xffffc0004591b800, isCopy=isCopy@entry=0x0) at ../../../src/share/native/common/jni_util.c:801 #22 0x000010000161ada6 in Java_java_io_UnixFileSystem_getBooleanAttributes0 (env=0xffffc0003a4dc1d8, this=<optimized out>, file=<optimized out>) at ../../../src/solaris/native/java/io/UnixFileSystem_md.c:111 #23 0x000020000021ed8e in ?? () #24 0x00002000001faa58 in ?? () #25 0x00002000001faac0 in ?? () #26 0x00002000001faa50 in ?? () #27 0x0000000000000000 in ?? () Spotted by Avi Kivity.
penberg
pushed a commit
that referenced
this issue
Sep 3, 2013
If "-javaagent" option is enabled, the JVM page faults: (gdb) bt #0 0x000000000033d432 in halt_no_interrupts () at ../../arch/x64/processor.hh:232 #1 osv::halt () at ../../core/power.cc:20 #2 0x0000000000214d82 in abort (msg=msg@entry=0x57ea90 "page fault outside application") at ../../runtime.cc:98 #3 0x00000000002fd4d8 in page_fault (ef=0xffffc0003ffe6008) at ../../core/mmu.cc:943 #4 <signal handler called> #5 0x0000000000373169 in __stpncpy (d=d@entry=0x2000001fe7f0 "dlsym: symbol JVM_begin_signal_setting not found", s=0x0, n=1023) at ../../libc/string/stpncpy.c:19 #6 0x0000000000373ad1 in strncpy (d=0x2000001fe7f0 "dlsym: symbol JVM_begin_signal_setting not found", s=<optimized out>, n=<optimized out>) at ../../libc/string/strncpy.c:7 #7 0x0000100000b0ab67 in os::dll_load (filename=filename@entry=0x2000001febf0 "libinstrument.so", ebuf=ebuf@entry=0x2000001fe7f0 "dlsym: symbol JVM_begin_signal_setting not found", ebuflen=ebuflen@entry=1024) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1841 #8 0x0000100000c247eb in lookup_on_load (agent=agent@entry=0xffffc0003976ccc0, on_load_symbols=on_load_symbols@entry=0x2000001ffd40, num_symbol_entries=1) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3585 #9 0x0000100000c2a64f in lookup_agent_on_load (agent=0xffffc0003976ccc0) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3617 #10 create_vm_init_agents () at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3656 #11 Threads::create_vm (args=<optimized out>, canTryAgain=canTryAgain@entry=0x2000001ffdb0) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3177 #12 0x000010000094d7b0 in JNI_CreateJavaVM (vm=0x2000001ffe58, penv=0x2000001ffe60, args=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/prims/jni.cpp:5127 #13 0x0000100000007b3b in main (argc=<optimized out>, argv=0xffffc0003fff2008) at ../../java/java.cc:73 #14 0x0000000000208ac8 in run_main (prog=<optimized out>, args=args@entry=0xffffc0003fe9bfa0) at ../../loader.cc:196 #15 0x0000000000208c68 in do_main_thread (_args=0xffffc0003fe9bfa0) at ../../loader.cc:217 #16 0x0000000000376d36 in operator() (__closure=0xffffc0003b5e2a00) at ../../libc/pthread.cc:59 #17 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071 #18 0x000000000032ed6b in main (this=0xffffc0003a807010) at ../../core/sched.cc:536 #19 sched::thread_main_c (t=0xffffc0003a807010) at ../../arch/x64/arch-switch.hh:133 #20 0x000000000031873e in thread_main () at ../../arch/x64/entry.S:101 Backtrace stopped: frame did not save the PC This is caused by the JVM expecting dlerror() to return an error string if dlopen() fails. Fix that up.
avikivity
pushed a commit
that referenced
this issue
Feb 11, 2014
This patch adds support for epoll()'s edge-triggered mode, EPOLLET. Fixes #188. As explained in issue #188, Boost's asio uses EPOLLET heavily, and we use that library in our management http server, and also in our image creation tool (cpiod.so). By ignoring EPOLLET, like we did until now, the code worked, but unnecessarily wasted CPU when epoll_wait() always returned immediately instead of waiting until a new event. This patch works within the confines of our existing poll mechanisms - where epoll() call poll(). We do not change this in this patch, and it should be changed in the future (see issue #17). In this patch we add to each struct file a field "poll_wake_count", which as its name suggests counts the number of poll_wake()s done on this file. Additionally, epoll remembers the last value it saw of this counter, so that in poll_scan(), if we see that an fp (polled with EPOLLET) has an unchanged counter from last time, we do not return readiness on this fp regardless on whether or not it has readable data. We have a complication with EPOLLET on sockets. These have an "SB_SEL" optimization, which avoids calling poll_wake() when it thinks the new data is not interesting because the old data was not yet consumed, and also avoids calling poll_wake() if fp->poll() was not previously done. This optimization is counter-productive for EPOLLET (and causes missed wakeups) so we need to work around it in the EPOLLET case. This patch also adds a test for the EPOLLET case in tst-epoll.cc. The test runs on both OSv and Linux, and can confirm that in the tested scenarios, Linux and OSv behave the same, including even one same false-positive: When epoll_wait() tells us there is data in a pipe, and we don't read it, but then more data comes on a pipe, epoll_wait() will again return a new event, despite this is not really being an edge event (the pipe didn't change from empty to not-empty, as it was previously not-empty as well). Concluding remarks: The primary goal of this implementation is to stop EPOLLET epoll_wait() from returning immediately despite nothing have happened on the file. That was what caused the 100% CPU use before this patch. That being said, the goal of this patch is NOT to avoid all false-positives or unnecessary wakeups; When events do occur on the file, we may be doing a bit more wakeups than strictly necessary. I think this is acceptable (our epoll() has worse problems) but for posterity, I want to explain: I already mentioned above one false-positive that also happens on Linux. Another false-positive wakeup that remains is in one of EPOLLET's classic use cases: Consider several threads sleeping on epoll() on the same socket (e.g., TCP listening socket, or UDP socket). When one packet arrives, normal level-triggered epoll() will wake all the threads, but only one will read the packet and the rest will find they have nothing to read. With edge- triggered epoll, only one thread should be woken and the rest would not. But in our implementation, poll_wake() wakes up *all* the pollers on this file, so we cannot currently support this optimization. Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
asias
added a commit
to asias/osv
that referenced
this issue
Aug 28, 2014
$ sudo scripts/run.py -c1 -nv -m4g -e "/tests/tst-epoll.so" ... PASS: epoll_wait PASS: write PASS: epoll_wait PASS: epoll_ctl MOD PASS: epoll_wait PASS: epoll_ctl ADD PASS: errno == EEXIST Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249) [backtrace] 0x0000000000223648 <__assert_fail+24> 0x000000000041743a <file::epoll_del(epoll_ptr)+186> 0x00000000003f7682 <epoll_ctl+786> 0x0000100000c032db <???+12595931> 0x000000000040510a <osv::application::run_main(std::string, int, char**)+266> 0x0000000000405847 <osv::application::run_main()+551> 0x000000000020bbd1 <osv::application::main()+97> 0x0000000000406698 <???+4220568> 0x0000000000433745 <???+4405061> 0x00000000003e38aa <thread_main_c+26> 0x0000000000389c75 <???+3710069> (gdb) bt #0 0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242 cloudius-systems#1 halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49 cloudius-systems#2 osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34 cloudius-systems#3 0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150 cloudius-systems#4 0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/asias/src/cloudius-systems/osv/runtime.cc:156 cloudius-systems#5 0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249 cloudius-systems#6 0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96 cloudius-systems#7 epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283 cloudius-systems#8 0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78 cloudius-systems#9 main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231 cloudius-systems#10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1, argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182 cloudius-systems#11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197 cloudius-systems#12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144 cloudius-systems#13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104 cloudius-systems#14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106 cloudius-systems#15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97 cloudius-systems#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071 cloudius-systems#17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940 cloudius-systems#18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137 cloudius-systems#19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113
asias
added a commit
to asias/osv
that referenced
this issue
Aug 28, 2014
After: Revert "Revert "epoll: fix up lock ordering issues"" This reverts commit 408a982. I saw the following without this patch: $ sudo scripts/run.py -c1 -nv -m4g -e "/tests/tst-epoll.so" ... PASS: epoll_wait PASS: write PASS: epoll_wait PASS: epoll_ctl MOD PASS: epoll_wait PASS: epoll_ctl ADD PASS: errno == EEXIST Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249) [backtrace] 0x0000000000223648 <__assert_fail+24> 0x000000000041743a <file::epoll_del(epoll_ptr)+186> 0x00000000003f7682 <epoll_ctl+786> 0x0000100000c032db <???+12595931> 0x000000000040510a <osv::application::run_main(std::string, int, char**)+266> 0x0000000000405847 <osv::application::run_main()+551> 0x000000000020bbd1 <osv::application::main()+97> 0x0000000000406698 <???+4220568> 0x0000000000433745 <???+4405061> 0x00000000003e38aa <thread_main_c+26> 0x0000000000389c75 <???+3710069> (gdb) bt #0 0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242 cloudius-systems#1 halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49 cloudius-systems#2 osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34 cloudius-systems#3 0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150 cloudius-systems#4 0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/asias/src/cloudius-systems/osv/runtime.cc:156 cloudius-systems#5 0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249 cloudius-systems#6 0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96 cloudius-systems#7 epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283 cloudius-systems#8 0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78 cloudius-systems#9 main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231 cloudius-systems#10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1, argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182 cloudius-systems#11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197 cloudius-systems#12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144 cloudius-systems#13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104 cloudius-systems#14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106 cloudius-systems#15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97 cloudius-systems#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071 cloudius-systems#17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940 cloudius-systems#18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137 cloudius-systems#19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113
asias
added a commit
to asias/osv
that referenced
this issue
Aug 28, 2014
The problem is that when epoll_wait removes the key from _file map, it does not remove the key from _epoll map, so when epoll_ctrl(DEL) tries to remove _file_map again, we hit the assert. This fixes the following. After: Revert "Revert "epoll: fix up lock ordering issues"" I saw: $ sudo scripts/run.py -c1 -nv -m4g -e "/tests/tst-epoll.so" ... PASS: epoll_wait PASS: write PASS: epoll_wait PASS: epoll_ctl MOD PASS: epoll_wait PASS: epoll_ctl ADD PASS: errno == EEXIST Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249) [backtrace] 0x0000000000223648 <__assert_fail+24> 0x000000000041743a <file::epoll_del(epoll_ptr)+186> 0x00000000003f7682 <epoll_ctl+786> 0x0000100000c032db <???+12595931> 0x000000000040510a <osv::application::run_main(std::string, int, char**)+266> 0x0000000000405847 <osv::application::run_main()+551> 0x000000000020bbd1 <osv::application::main()+97> 0x0000000000406698 <???+4220568> 0x0000000000433745 <???+4405061> 0x00000000003e38aa <thread_main_c+26> 0x0000000000389c75 <???+3710069> (gdb) bt #0 0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242 cloudius-systems#1 halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49 cloudius-systems#2 osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34 cloudius-systems#3 0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150 cloudius-systems#4 0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/asias/src/cloudius-systems/osv/runtime.cc:156 cloudius-systems#5 0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249 cloudius-systems#6 0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96 cloudius-systems#7 epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283 cloudius-systems#8 0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78 cloudius-systems#9 main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231 cloudius-systems#10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1, argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182 cloudius-systems#11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197 cloudius-systems#12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144 cloudius-systems#13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104 cloudius-systems#14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106 cloudius-systems#15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97 cloudius-systems#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071 cloudius-systems#17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940 cloudius-systems#18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137 cloudius-systems#19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113 Signed-off-by: Asias He <[email protected]>
avikivity
pushed a commit
that referenced
this issue
Aug 28, 2014
The problem is that when epoll_wait removes the key from _file map, it does not remove the key from _epoll map, so when epoll_ctrl(DEL) tries to remove _file_map again, we hit the assert. This fixes the following. After: Revert "Revert "epoll: fix up lock ordering issues"" I saw: $ sudo scripts/run.py -c1 -nv -m4g -e "/tests/tst-epoll.so" ... PASS: epoll_wait PASS: write PASS: epoll_wait PASS: epoll_ctl MOD PASS: epoll_wait PASS: epoll_ctl ADD PASS: errno == EEXIST Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249) [backtrace] 0x0000000000223648 <__assert_fail+24> 0x000000000041743a <file::epoll_del(epoll_ptr)+186> 0x00000000003f7682 <epoll_ctl+786> 0x0000100000c032db <???+12595931> 0x000000000040510a <osv::application::run_main(std::string, int, char**)+266> 0x0000000000405847 <osv::application::run_main()+551> 0x000000000020bbd1 <osv::application::main()+97> 0x0000000000406698 <???+4220568> 0x0000000000433745 <???+4405061> 0x00000000003e38aa <thread_main_c+26> 0x0000000000389c75 <???+3710069> (gdb) bt #0 0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242 #1 halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49 #2 osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34 #3 0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150 #4 0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/asias/src/cloudius-systems/osv/runtime.cc:156 #5 0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249 #6 0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96 #7 epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283 #8 0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78 #9 main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231 #10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1, argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182 #11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197 #12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144 #13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104 #14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106 #15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97 #16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071 #17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940 #18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137 #19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113 Signed-off-by: Asias He <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
I'm closing this issue - we have a completely rewritten epoll now, and it's no longer a wrapper around poll, solving the original problem (although with a different design than what I suggested above). |
nyh
pushed a commit
that referenced
this issue
Mar 29, 2021
…ssembly This patch adds necessary CFI (Call Frame Information) directives to the assembly code in entry.S and macros.S to make stack traces for page faults and interrupt handler look correct. Otherwise debugging scenarios that involve those becomes very painful. Before the patch: ----------------- #0 mmu::file_vma::fault (this=0xffffa00040c9b280, addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1708 #1 0x00000000401da05c in mmu::vm_fault (addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1354 #2 0x000000004020ba9c in page_fault (ef=0x200000100110) at arch/aarch64/mmu.cc:50 #3 0x000000004020b82c in handle_mem_abort () at arch/aarch64/entry.S:161 Backtrace stopped: previous frame identical to this frame (corrupt stack?) After the patch: ---------------- #0 mmu::file_vma::fault (this=0xffffa00040c9b280, addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1708 #1 0x00000000401da05c in mmu::vm_fault (addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1354 #2 0x000000004020ba9c in page_fault (ef=0x200000100110) at arch/aarch64/mmu.cc:50 #3 <signal handler called> #4 elf::object::_dynamic_tag (tag=29, this=0xffffa00040dcf800) at core/elf.cc:649 #5 elf::object::dynamic_exists (tag=29, this=0xffffa00040dcf800) at core/elf.cc:641 #6 elf::object::load_needed (this=0xffffa00040dcf800, loaded_objects=std::vector of length 1, capacity 1 = {...}) at core/elf.cc:1068 #7 0x00000000401f1bd4 in elf::program::load_object (this=this@entry=0xffffa00040897ba0, name="/libvdso.so", extra_path=std::vector of length 0, capacity 0, loaded_objects=std::vector of length 1, capacity 1 = {...}) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/shared_ptr_base.h:1321 #8 0x00000000401f24fc in elf::program::get_library (this=0xffffa00040897ba0, name="libvdso.so", extra_path=std::vector of length 0, capacity 0, delay_init=false) at core/elf.cc:1459 #9 0x0000000040314ea0 in osv::application::prepare_argv (this=this@entry=0xffffa00040dcf610, program=program@entry=0xffffa00040897ba0) at core/app.cc:361 #10 0x0000000040315694 in osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) (this=0xffffa00040dcf610, command="/tests/tst-hello.so", args=..., new_program=false, env=0x0, main_function_name="main", post_main=...) at core/app.cc:182 ... Before the patch: ----------------- #0 virtio::blk::ack_irq (this=0xffffa00040a91c00) at drivers/virtio-blk.cc:103 #1 operator() (__closure=0xffffa00040950e40) at drivers/virtio-blk.cc:143 #2 std::__invoke_impl<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__f=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:60 #3 std::__invoke_r<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__fn=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:141 #4 std::_Function_handler<bool(), virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:291 #5 0x000000004020c784 in std::function<bool ()>::operator()() const (this=<optimized out>) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:622 #6 interrupt_table::invoke_interrupt (this=this@entry=0x40780968 <idt>, id=id@entry=36) at arch/aarch64/exceptions.cc:135 #7 0x000000004020c95c in interrupt (frame=<optimized out>) at arch/aarch64/exceptions.cc:167 #8 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175 #9 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175 #10 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175 #11 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175 #12 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175 #13 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175 ... (infinite number of frames) After the patch: ---------------- #0 virtio::blk::ack_irq (this=0xffffa00040a91c00) at drivers/virtio-blk.cc:103 #1 operator() (__closure=0xffffa00040950e40) at drivers/virtio-blk.cc:143 #2 std::__invoke_impl<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__f=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:60 #3 std::__invoke_r<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__fn=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:141 #4 std::_Function_handler<bool(), virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:291 #5 0x000000004020c784 in std::function<bool ()>::operator()() const (this=<optimized out>) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:622 #6 interrupt_table::invoke_interrupt (this=this@entry=0x40780968 <idt>, id=id@entry=36) at arch/aarch64/exceptions.cc:135 #7 0x000000004020c95c in interrupt (frame=<optimized out>) at arch/aarch64/exceptions.cc:167 #8 <signal handler called> #9 sched::cpu::do_idle (this=this@entry=0xffff8000407fe040) at core/sched.cc:454 #10 0x00000000402e6b78 in sched::cpu::idle (this=0xffff8000407fe040) at core/sched.cc:472 #11 0x00000000402e6ba0 in operator() (__closure=<optimized out>, __closure=<optimized out>) at core/sched.cc:165 #12 std::__invoke_impl<void, sched::cpu::init_idle_thread()::<lambda()>&> (__f=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:60 #13 std::__invoke_r<void, sched::cpu::init_idle_thread()::<lambda()>&> (__fn=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:153 #14 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:291 #15 0x00000000402e72b4 in sched::thread::main (this=0xffffa000408eea48) at core/sched.cc:1267 #16 sched::thread_main_c (t=0xffffa000408eea48) at arch/aarch64/arch-switch.hh:162 #17 0x000000004020b788 in thread_main () at arch/aarch64/entry.S:114 Refs #1128 Signed-off-by: Waldemar Kozaczuk <[email protected]> Message-Id: <[email protected]>
nyh
pushed a commit
that referenced
this issue
May 11, 2022
The manifest skeleton files like usr.manifest.skel have an entry for libgcc_s_dir looking like this: /usr/lib/libgcc_s.so.1: %(libgcc_s_dir)s/libgcc_s.so.1 This actually gets expanded quite late during the build process by upload_manifest.py. The unfortunate consequence of this is that loader.py used during debugging which reads usr.manifest does not have any logic to load libgcc_s.so.1. And this makes stack traces look useless in those cases that involve libgcc_s.so.1. So this patch slightly changes the scripts/build and scripts/module.py to expand %(libgcc_s_dir) when writing to build/release/usr.manifest. As a result of this the stack trace of the crash I have been working on looks much more reasonable: (gdb) bt '#0 0x00000000403047c2 in processor::cli_hlt () at arch/x64/processor.hh:247 #1 arch::halt_no_interrupts () at arch/x64/arch.hh:48 #2 osv::halt () at arch/x64/power.cc:29 #3 0x0000000040239504 in abort (fmt=fmt@entry=0x405b1e93 "Aborted\n") at runtime.cc:142 #4 0x0000000040202e80 in abort () at runtime.cc:106 #5 0x000010000002b6b6 in ?? () #6 0x000010000003f5cb in _Unwind_Resume () #7 0x0000100000062daa in ?? () #8 0x0000100000075b5c in boost::execution_monitor::vexecute(boost::function<void ()> const&) () #9 0x000010000007f0a9 in boost::unit_test::framework::init(bool (*)(), int, char**) () #10 0x000010000009254d in boost::unit_test::unit_test_main(bool (*)(), int, char**) () #11 0x000000004039d021 in osv::application::run_main (this=0xffffa00000bd8c10) at core/app.cc:416 #12 0x000000004039d22d in operator() (app=<optimized out>, __closure=0x0) at core/app.cc:236 #13 _FUN () at core/app.cc:238 #14 0x00000000403d089a in operator() (__closure=0xffffa00000d57800) at libc/pthread.cc:116 #15 std::__invoke_impl<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__f=...) at /usr/include/c++/11/bits/invoke.h:61 #16 std::__invoke_r<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at /usr/include/c++/11/bits/invoke.h:154 #17 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/11/bits/std_function.h:290 #18 0x000000004036b5ae in sched::thread::main (this=0xffff800000f6a040) at core/sched.cc:1267 #19 sched::thread_main_c (t=0xffff800000f6a040) at arch/x64/arch-switch.hh:325 #20 0x00000000402fda43 in thread_main () at arch/x64/entry.S:116 Signed-off-by: Waldemar Kozaczuk <[email protected]> Message-Id: <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Right now poll() has a rather unusual design favoring the performance of immediately-returning poll (we poll_scan() before trying to install the poll request on the descriptors). We also do not efficiently support epoll, and epoll_wait() just calls poll() which is terrible for performance.
Instead of the current poll.c code, let's design our own osv::poll object, inspired by epoll. It will have an add() method to install itself on one file structure, remove() to uninstall, and wait() to wait on any of these files (or return immediately). We will wait using a "waker" (our familar wait_until+threadid idiom) - I believe we don't need condvar (we only ever have one waiter in a poll request) and certainly not msleep, and the "waker" idiom also takes care of wake-before-wait races without needing another variable or a mutex to protect it.
osv::poll will also have a static function osv::poll::wake(file *fp) to wake any polls sleeping on this fp (and also set their revents and add them to a list of ready files). The "extern C" function poll_wake() will just call osv::poll::wake.
The epoll code will wrap osv::poll. The poll() code will temporarily create an osv::poll (just like we have a poll_request in the current code), add() to it the given fds, call wait(), and finally prepare the poll results.
The text was updated successfully, but these errors were encountered: