-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Java agent: update ref count in enabledLoggers #3
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Integer objects are immutable in Java, so Integer refcount = enabledLoggers.get(name); refcount--; does not update the value in enabledLoggers.
Closing since another fix is expected for 2.6. |
compudj
pushed a commit
that referenced
this pull request
May 5, 2020
Background ========== When userspace events with exclusions are enabled by the CLI user, the session daemon enables the events in a 3-steps process using the following ustctl commands: 1. `LTTNG_UST_EVENT` to create an event and get an handle on it, 2. `LTTNG_UST_EXCLUSION` to attach exclusions, and 3. `LTTNG_UST_ENABLE` to activate the tracing of this event. Also, the session daemon uses the `LTTNG_UST_SESSION_START` to start the tracing of events. In various use cases, this can happen before OR after the 3-steps process above. Scenario ======== If the`LTTNG_UST_SESSION_START` is done before the 3-steps process the tracer will end up not considering the exclusions and trace all events matching the event name glob pattern provided at step #1. This is due to the fact that when we do step #1, we end up calling `lttng_session_lazy_sync_enablers()`. This function will sync the event enablers if the session is active (which it is in our scenario because of the _SESSION_START). Syncing the enablers will then match event name glob patterns provided by the user to the event descriptors of the tracepoints and attach probes in their corresponding callsite on matches. All of this is done before we even received the exclusions in step #2. Problem ======= The problem is that we attach probes to tracepoints before being sure we received the entire description of the events. Step #2 may reduced the set of tracepoints to trace so we must wait until exclusions are all received to attached probes to tracepoints. Solution ======== Only match event names and exclusions (and ultimately, register probes) on enabled enablers to ensure that no other modifications will be applied to the event. Event enablers are enabled when at step #3. Note ==== Filters are not causing problems here because adding a filter won't change what tracepoints are to be traced. Signed-off-by: Francis Deslauriers <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: Id984f266d976f346b001db81cd8a2b74965b5ef2
compudj
pushed a commit
that referenced
this pull request
May 5, 2020
Background ========== When userspace events with exclusions are enabled by the CLI user, the session daemon enables the events in a 3-steps process using the following ustctl commands: 1. `LTTNG_UST_EVENT` to create an event and get an handle on it, 2. `LTTNG_UST_EXCLUSION` to attach exclusions, and 3. `LTTNG_UST_ENABLE` to activate the tracing of this event. Also, the session daemon uses the `LTTNG_UST_SESSION_START` to start the tracing of events. In various use cases, this can happen before OR after the 3-steps process above. Scenario ======== If the`LTTNG_UST_SESSION_START` is done before the 3-steps process the tracer will end up not considering the exclusions and trace all events matching the event name glob pattern provided at step #1. This is due to the fact that when we do step #1, we end up calling `lttng_session_lazy_sync_enablers()`. This function will sync the event enablers if the session is active (which it is in our scenario because of the _SESSION_START). Syncing the enablers will then match event name glob patterns provided by the user to the event descriptors of the tracepoints and attach probes in their corresponding callsite on matches. All of this is done before we even received the exclusions in step #2. Problem ======= The problem is that we attach probes to tracepoints before being sure we received the entire description of the events. Step #2 may reduced the set of tracepoints to trace so we must wait until exclusions are all received to attached probes to tracepoints. Solution ======== Only match event names and exclusions (and ultimately, register probes) on enabled enablers to ensure that no other modifications will be applied to the event. Event enablers are enabled when at step #3. Note ==== Filters are not causing problems here because adding a filter won't change what tracepoints are to be traced. Signed-off-by: Francis Deslauriers <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: Id984f266d976f346b001db81cd8a2b74965b5ef2
compudj
pushed a commit
that referenced
this pull request
May 5, 2020
Background ========== When userspace events with exclusions are enabled by the CLI user, the session daemon enables the events in a 3-steps process using the following ustctl commands: 1. `LTTNG_UST_EVENT` to create an event and get an handle on it, 2. `LTTNG_UST_EXCLUSION` to attach exclusions, and 3. `LTTNG_UST_ENABLE` to activate the tracing of this event. Also, the session daemon uses the `LTTNG_UST_SESSION_START` to start the tracing of events. In various use cases, this can happen before OR after the 3-steps process above. Scenario ======== If the`LTTNG_UST_SESSION_START` is done before the 3-steps process the tracer will end up not considering the exclusions and trace all events matching the event name glob pattern provided at step #1. This is due to the fact that when we do step #1, we end up calling `lttng_session_lazy_sync_enablers()`. This function will sync the event enablers if the session is active (which it is in our scenario because of the _SESSION_START). Syncing the enablers will then match event name glob patterns provided by the user to the event descriptors of the tracepoints and attach probes in their corresponding callsite on matches. All of this is done before we even received the exclusions in step #2. Problem ======= The problem is that we attach probes to tracepoints before being sure we received the entire description of the events. Step #2 may reduced the set of tracepoints to trace so we must wait until exclusions are all received to attached probes to tracepoints. Solution ======== Only match event names and exclusions (and ultimately, register probes) on enabled enablers to ensure that no other modifications will be applied to the event. Event enablers are enabled when at step #3. Note ==== Filters are not causing problems here because adding a filter won't change what tracepoints are to be traced. Signed-off-by: Francis Deslauriers <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: Id984f266d976f346b001db81cd8a2b74965b5ef2
compudj
pushed a commit
that referenced
this pull request
Dec 9, 2021
… teardown Observed issue ============== The following backtrace has been reported: #0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.31/git/sysdeps/unix/sysv/linux/raise.c:50 #1 0x0000007f90b3fdd4 in __GI_abort () at /usr/src/debug/glibc/2.31/git/stdlib/abort.c:79 #2 0x0000007f90b4bf50 in __assert_fail_base (fmt=0x7f90c3da98 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count", file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c", line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count") at /usr/src/debug/glibc/2.31/git/assert/assert.c:92 #3 0x0000007f90b4bfb4 in __GI___assert_fail (assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count", file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c", line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count") at /usr/src/debug/glibc/2.31/git/assert/assert.c:101 #4 0x0000007f910e3830 in decrement_sem_count (count=<optimized out>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:664 #5 0x0000007f910e5d28 in handle_pending_statedump (sock_info=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:737 #6 handle_message (lum=0x7f8dde46d8, sock=3, sock_info=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:1410 #7 ust_listener_thread (arg=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:2055 #8 0x0000007f90af73e0 in start_thread (arg=0x7fc27a82f6) at /usr/src/debug/glibc/2.31/git/nptl/pthread_create.c:477 #9 0x0000007f90bead5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 It turns out that the main thread is at that point iterating over the libraries destructors: Thread 3 (LWP 1983): #0 0x0000007f92a68a0c in _dl_fixup (l=0x7f9054e510, reloc_arg=432) at /usr/src/debug/glibc/2.31/git/elf/dl-runtime.c:69 #1 0x0000007f92a6ea3c in _dl_runtime_resolve () at ../sysdeps/aarch64/dl-trampoline.S:100 #2 0x0000007f905170f8 in __do_global_dtors_aux () from <....>/crash/work/rootfs/usr/lib/libbsd.so.0 #3 0x0000007f92a697f8 in _dl_fini () at /usr/src/debug/glibc/2.31/git/elf/dl-fini.c:138 #4 0x0000007f90b54864 in __run_exit_handlers (status=0, listp=0x7f90c65648 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:108 #5 0x0000007f90b549f4 in __GI_exit (status=<optimized out>) at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:139 #6 0x0000000000404c98 in a_function_name (....) at main.c:152 #7 0x0000000000404a98 in main (argc=3, argv=0x7fc27a8858, env=0x7fc27a8878) at main.c:97 Cause ===== An enable command is processed at the same time that the lttng-ust destructor is run. At the end of the command handling, `handle_pending_statedump` is called. Multiple variables from the `sock_info` struct are checked outside the UST lock at that point. lttng-ust-comm.c +1406: /* * Performed delayed statedump operations outside of the UST * lock. We need to take the dynamic loader lock before we take * the UST lock internally within handle_pending_statedump(). */ handle_pending_statedump(sock_info); Namely: registration_done statedump_pending initial_statedump_done `statedump_pending` is set during the enable command (`lttng_session_statedump`, lttng-events.c +631) in the same thread. As for `registration_done` and `initial_statedump_done` they are invariant from the registration of the app until `lttng_ust_cleanup` is called. `cleanup_sock_info` called by `lttng_ust_cleanup`, itself called by `lttng_ust_exit` resets the `registration_done` and `initial_statedump_done` fields. Note that these operations are done outside of the listener thread. Note that by that point `lttng_ust_exit` expects all "getters" on `sock_info` to fail while trying to acquire the UST lock due to `lttng_ust_comm_should_quit` set to 1. Note that the listener threads can still exist because we do not join them, we only execute pthread_cancel which is async. Clearly we are missing mutual exclusion provided by locking when accessing `registration_done` and `initial_statedump_done`. Solution ======== Here we can do better and simply not require any mutual exclusion based on locking. `registration_done` and `initial_statedump_done` only need to be reset to zero when we are not actually exiting (`lttng_ust_after_fork_child`). In this case, no concurrent listener thread exists at that point that could access those fields during the reset. Hence we can move the reset to only the non-exiting code path and alleviate the current situation. Known drawbacks =============== None. Signed-off-by: Jonathan Rajotte <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I45ba3eaee20c49a3988837a87fa680ce0a6ed953
compudj
pushed a commit
that referenced
this pull request
Dec 9, 2021
… teardown Observed issue ============== The following backtrace has been reported: #0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.31/git/sysdeps/unix/sysv/linux/raise.c:50 #1 0x0000007f90b3fdd4 in __GI_abort () at /usr/src/debug/glibc/2.31/git/stdlib/abort.c:79 #2 0x0000007f90b4bf50 in __assert_fail_base (fmt=0x7f90c3da98 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count", file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c", line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count") at /usr/src/debug/glibc/2.31/git/assert/assert.c:92 #3 0x0000007f90b4bfb4 in __GI___assert_fail (assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count", file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c", line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count") at /usr/src/debug/glibc/2.31/git/assert/assert.c:101 #4 0x0000007f910e3830 in decrement_sem_count (count=<optimized out>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:664 #5 0x0000007f910e5d28 in handle_pending_statedump (sock_info=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:737 #6 handle_message (lum=0x7f8dde46d8, sock=3, sock_info=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:1410 #7 ust_listener_thread (arg=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:2055 #8 0x0000007f90af73e0 in start_thread (arg=0x7fc27a82f6) at /usr/src/debug/glibc/2.31/git/nptl/pthread_create.c:477 #9 0x0000007f90bead5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 It turns out that the main thread is at that point iterating over the libraries destructors: Thread 3 (LWP 1983): #0 0x0000007f92a68a0c in _dl_fixup (l=0x7f9054e510, reloc_arg=432) at /usr/src/debug/glibc/2.31/git/elf/dl-runtime.c:69 #1 0x0000007f92a6ea3c in _dl_runtime_resolve () at ../sysdeps/aarch64/dl-trampoline.S:100 #2 0x0000007f905170f8 in __do_global_dtors_aux () from <....>/crash/work/rootfs/usr/lib/libbsd.so.0 #3 0x0000007f92a697f8 in _dl_fini () at /usr/src/debug/glibc/2.31/git/elf/dl-fini.c:138 #4 0x0000007f90b54864 in __run_exit_handlers (status=0, listp=0x7f90c65648 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:108 #5 0x0000007f90b549f4 in __GI_exit (status=<optimized out>) at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:139 #6 0x0000000000404c98 in a_function_name (....) at main.c:152 #7 0x0000000000404a98 in main (argc=3, argv=0x7fc27a8858, env=0x7fc27a8878) at main.c:97 Cause ===== An enable command is processed at the same time that the lttng-ust destructor is run. At the end of the command handling, `handle_pending_statedump` is called. Multiple variables from the `sock_info` struct are checked outside the UST lock at that point. lttng-ust-comm.c +1406: /* * Performed delayed statedump operations outside of the UST * lock. We need to take the dynamic loader lock before we take * the UST lock internally within handle_pending_statedump(). */ handle_pending_statedump(sock_info); Namely: registration_done statedump_pending initial_statedump_done `statedump_pending` is set during the enable command (`lttng_session_statedump`, lttng-events.c +631) in the same thread. As for `registration_done` and `initial_statedump_done` they are invariant from the registration of the app until `lttng_ust_cleanup` is called. `cleanup_sock_info` called by `lttng_ust_cleanup`, itself called by `lttng_ust_exit` resets the `registration_done` and `initial_statedump_done` fields. Note that these operations are done outside of the listener thread. Note that by that point `lttng_ust_exit` expects all "getters" on `sock_info` to fail while trying to acquire the UST lock due to `lttng_ust_comm_should_quit` set to 1. Note that the listener threads can still exist because we do not join them, we only execute pthread_cancel which is async. Clearly we are missing mutual exclusion provided by locking when accessing `registration_done` and `initial_statedump_done`. Solution ======== Here we can do better and simply not require any mutual exclusion based on locking. `registration_done` and `initial_statedump_done` only need to be reset to zero when we are not actually exiting (`lttng_ust_after_fork_child`). In this case, no concurrent listener thread exists at that point that could access those fields during the reset. Hence we can move the reset to only the non-exiting code path and alleviate the current situation. Known drawbacks =============== None. Signed-off-by: Jonathan Rajotte <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I45ba3eaee20c49a3988837a87fa680ce0a6ed953
compudj
pushed a commit
that referenced
this pull request
Dec 9, 2021
… teardown Observed issue ============== The following backtrace has been reported: #0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.31/git/sysdeps/unix/sysv/linux/raise.c:50 #1 0x0000007f90b3fdd4 in __GI_abort () at /usr/src/debug/glibc/2.31/git/stdlib/abort.c:79 #2 0x0000007f90b4bf50 in __assert_fail_base (fmt=0x7f90c3da98 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count", file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c", line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count") at /usr/src/debug/glibc/2.31/git/assert/assert.c:92 #3 0x0000007f90b4bfb4 in __GI___assert_fail (assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count", file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c", line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count") at /usr/src/debug/glibc/2.31/git/assert/assert.c:101 #4 0x0000007f910e3830 in decrement_sem_count (count=<optimized out>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:664 #5 0x0000007f910e5d28 in handle_pending_statedump (sock_info=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:737 #6 handle_message (lum=0x7f8dde46d8, sock=3, sock_info=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:1410 #7 ust_listener_thread (arg=0x7f9115c608 <global_apps>) at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:2055 #8 0x0000007f90af73e0 in start_thread (arg=0x7fc27a82f6) at /usr/src/debug/glibc/2.31/git/nptl/pthread_create.c:477 #9 0x0000007f90bead5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 It turns out that the main thread is at that point iterating over the libraries destructors: Thread 3 (LWP 1983): #0 0x0000007f92a68a0c in _dl_fixup (l=0x7f9054e510, reloc_arg=432) at /usr/src/debug/glibc/2.31/git/elf/dl-runtime.c:69 #1 0x0000007f92a6ea3c in _dl_runtime_resolve () at ../sysdeps/aarch64/dl-trampoline.S:100 #2 0x0000007f905170f8 in __do_global_dtors_aux () from <....>/crash/work/rootfs/usr/lib/libbsd.so.0 #3 0x0000007f92a697f8 in _dl_fini () at /usr/src/debug/glibc/2.31/git/elf/dl-fini.c:138 #4 0x0000007f90b54864 in __run_exit_handlers (status=0, listp=0x7f90c65648 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:108 #5 0x0000007f90b549f4 in __GI_exit (status=<optimized out>) at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:139 #6 0x0000000000404c98 in a_function_name (....) at main.c:152 #7 0x0000000000404a98 in main (argc=3, argv=0x7fc27a8858, env=0x7fc27a8878) at main.c:97 Cause ===== An enable command is processed at the same time that the lttng-ust destructor is run. At the end of the command handling, `handle_pending_statedump` is called. Multiple variables from the `sock_info` struct are checked outside the UST lock at that point. lttng-ust-comm.c +1406: /* * Performed delayed statedump operations outside of the UST * lock. We need to take the dynamic loader lock before we take * the UST lock internally within handle_pending_statedump(). */ handle_pending_statedump(sock_info); Namely: registration_done statedump_pending initial_statedump_done `statedump_pending` is set during the enable command (`lttng_session_statedump`, lttng-events.c +631) in the same thread. As for `registration_done` and `initial_statedump_done` they are invariant from the registration of the app until `lttng_ust_cleanup` is called. `cleanup_sock_info` called by `lttng_ust_cleanup`, itself called by `lttng_ust_exit` resets the `registration_done` and `initial_statedump_done` fields. Note that these operations are done outside of the listener thread. Note that by that point `lttng_ust_exit` expects all "getters" on `sock_info` to fail while trying to acquire the UST lock due to `lttng_ust_comm_should_quit` set to 1. Note that the listener threads can still exist because we do not join them, we only execute pthread_cancel which is async. Clearly we are missing mutual exclusion provided by locking when accessing `registration_done` and `initial_statedump_done`. Solution ======== Here we can do better and simply not require any mutual exclusion based on locking. `registration_done` and `initial_statedump_done` only need to be reset to zero when we are not actually exiting (`lttng_ust_after_fork_child`). In this case, no concurrent listener thread exists at that point that could access those fields during the reset. Hence we can move the reset to only the non-exiting code path and alleviate the current situation. Known drawbacks =============== None. Signed-off-by: Jonathan Rajotte <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I45ba3eaee20c49a3988837a87fa680ce0a6ed953
compudj
pushed a commit
that referenced
this pull request
Dec 9, 2021
Observed issue ============== Applications which transitively dlopen() a library which, in turn, dlopen() providers crash when they are compiled with clang or if LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP is defined. Core was generated by `././myapp.exe'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fa94f860bc2 in check_event_provider (probe_desc=<optimized out>) at lttng-probes.c:153 153 if (!check_type_provider(field->type)) { [Current thread is 1 (Thread 0x7fa94fcbc740 (LWP 511754))] (gdb) bt #0 0x00007fa94f860bc2 in check_event_provider (probe_desc=<optimized out>) at lttng-probes.c:153 #1 lttng_ust_probe_register (desc=0x7fa94fe9dc80 <lttng_ust__probe_desc___embedded_sys>) at lttng-probes.c:242 #2 0x00007fa94fe9ba3c in lttng_ust__tracepoints__ptrs_destroy () at /usr/include/lttng/tracepoint.h:590 #3 0x00007fa94fedfe2e in call_init () from /lib64/ld-linux-x86-64.so.2 #4 0x00007fa94fedff1c in _dl_init () from /lib64/ld-linux-x86-64.so.2 #5 0x00007fa94fdf7d45 in _dl_catch_exception () from /usr/lib/libc.so.6 #6 0x00007fa94fee420a in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #7 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #8 0x00007fa94fee39bb in _dl_open () from /lib64/ld-linux-x86-64.so.2 #9 0x00007fa94fe8d36c in ?? () from /usr/lib/libdl.so.2 #10 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #11 0x00007fa94fdf7db3 in _dl_catch_error () from /usr/lib/libc.so.6 #12 0x00007fa94fe8db99 in ?? () from /usr/lib/libdl.so.2 #13 0x00007fa94fe8d3f8 in dlopen () from /usr/lib/libdl.so.2 #14 0x00007fa94fecc647 in mon_constructeur () at mylib.cpp:20 #15 0x00007fa94fedfe2e in call_init () from /lib64/ld-linux-x86-64.so.2 #16 0x00007fa94fedff1c in _dl_init () from /lib64/ld-linux-x86-64.so.2 #17 0x00007fa94fdf7d45 in _dl_catch_exception () from /usr/lib/libc.so.6 #18 0x00007fa94fee420a in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #19 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #20 0x00007fa94fee39bb in _dl_open () from /lib64/ld-linux-x86-64.so.2 #21 0x00007fa94fe8d36c in ?? () from /usr/lib/libdl.so.2 #22 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #23 0x00007fa94fdf7db3 in _dl_catch_error () from /usr/lib/libc.so.6 #24 0x00007fa94fe8db99 in ?? () from /usr/lib/libdl.so.2 #25 0x00007fa94fe8d3f8 in dlopen () from /usr/lib/libdl.so.2 #26 0x00005594f478c18c in main () Cause ===== Building tracepoint instrumentation as C++ using clang causes LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP to be defined due to a compiler version detection problem addressed by another patch. However, building with LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP defined still results in the crash. When LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP is defined, the lttng_ust_event_field lttng_ust__event_fields__[...] structure is initialized by dynamically-allocating field structures for the various fields. As the initialization can't be performed statically, it is performed at run-time _after_ the execution of the library constructors has completed. Moreover, the generated initialization function of the provider (lttng_ust__events_init__[...]) is declared as being a library constructor. Hence, this causes it to run before the tracepoint fields structures has a chance to be initialized. This all results in a NULL pointer dereference during the validation of the fields. Solution ======== When building providers as C++, the initialization function is defined as the constructor of a class. This class is, in turn, instantiated in an anonymous namespace. For the purposes of this patch, the use of an anonymous namespace is equivalent to declaring the instance as 'static', but it is preferred in C++11. Known drawbacks =============== None. References ========== A reproducer is available: https://github.com/jgalar/ust-clang-reproducer Problem initially reported on dotnet/runtime's issue tracker: dotnet/runtime#62398 Relevant LTTng-UST issue: https://bugs.lttng.org/issues/1339 Fixes: #1339 Change-Id: I51cfbe74729bd45e2613a30bc8de17e08ea8233d Signed-off-by: Jérémie Galarneau <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]>
compudj
pushed a commit
that referenced
this pull request
Dec 9, 2021
Observed issue ============== Applications which transitively dlopen() a library which, in turn, dlopen() providers crash when they are compiled with clang or if LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP is defined. Core was generated by `././myapp.exe'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fa94f860bc2 in check_event_provider (probe_desc=<optimized out>) at lttng-probes.c:153 153 if (!check_type_provider(field->type)) { [Current thread is 1 (Thread 0x7fa94fcbc740 (LWP 511754))] (gdb) bt #0 0x00007fa94f860bc2 in check_event_provider (probe_desc=<optimized out>) at lttng-probes.c:153 #1 lttng_ust_probe_register (desc=0x7fa94fe9dc80 <lttng_ust__probe_desc___embedded_sys>) at lttng-probes.c:242 #2 0x00007fa94fe9ba3c in lttng_ust__tracepoints__ptrs_destroy () at /usr/include/lttng/tracepoint.h:590 #3 0x00007fa94fedfe2e in call_init () from /lib64/ld-linux-x86-64.so.2 #4 0x00007fa94fedff1c in _dl_init () from /lib64/ld-linux-x86-64.so.2 #5 0x00007fa94fdf7d45 in _dl_catch_exception () from /usr/lib/libc.so.6 #6 0x00007fa94fee420a in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #7 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #8 0x00007fa94fee39bb in _dl_open () from /lib64/ld-linux-x86-64.so.2 #9 0x00007fa94fe8d36c in ?? () from /usr/lib/libdl.so.2 #10 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #11 0x00007fa94fdf7db3 in _dl_catch_error () from /usr/lib/libc.so.6 #12 0x00007fa94fe8db99 in ?? () from /usr/lib/libdl.so.2 #13 0x00007fa94fe8d3f8 in dlopen () from /usr/lib/libdl.so.2 #14 0x00007fa94fecc647 in mon_constructeur () at mylib.cpp:20 #15 0x00007fa94fedfe2e in call_init () from /lib64/ld-linux-x86-64.so.2 #16 0x00007fa94fedff1c in _dl_init () from /lib64/ld-linux-x86-64.so.2 #17 0x00007fa94fdf7d45 in _dl_catch_exception () from /usr/lib/libc.so.6 #18 0x00007fa94fee420a in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #19 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #20 0x00007fa94fee39bb in _dl_open () from /lib64/ld-linux-x86-64.so.2 #21 0x00007fa94fe8d36c in ?? () from /usr/lib/libdl.so.2 #22 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6 #23 0x00007fa94fdf7db3 in _dl_catch_error () from /usr/lib/libc.so.6 #24 0x00007fa94fe8db99 in ?? () from /usr/lib/libdl.so.2 #25 0x00007fa94fe8d3f8 in dlopen () from /usr/lib/libdl.so.2 #26 0x00005594f478c18c in main () Cause ===== Building tracepoint instrumentation as C++ using clang causes LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP to be defined due to a compiler version detection problem addressed by another patch. However, building with LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP defined still results in the crash. When LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP is defined, the lttng_ust_event_field lttng_ust__event_fields__[...] structure is initialized by dynamically-allocating field structures for the various fields. As the initialization can't be performed statically, it is performed at run-time _after_ the execution of the library constructors has completed. Moreover, the generated initialization function of the provider (lttng_ust__events_init__[...]) is declared as being a library constructor. Hence, this causes it to run before the tracepoint fields structures has a chance to be initialized. This all results in a NULL pointer dereference during the validation of the fields. Solution ======== When building providers as C++, the initialization function is defined as the constructor of a class. This class is, in turn, instantiated in an anonymous namespace. For the purposes of this patch, the use of an anonymous namespace is equivalent to declaring the instance as 'static', but it is preferred in C++11. Known drawbacks =============== None. References ========== A reproducer is available: https://github.com/jgalar/ust-clang-reproducer Problem initially reported on dotnet/runtime's issue tracker: dotnet/runtime#62398 Relevant LTTng-UST issue: https://bugs.lttng.org/issues/1339 Fixes: #1339 Change-Id: I51cfbe74729bd45e2613a30bc8de17e08ea8233d Signed-off-by: Jérémie Galarneau <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]>
mjeanson
pushed a commit
to mjeanson/lttng-ust
that referenced
this pull request
Oct 5, 2022
Eliminate the following deadlock by calling libc close directly when called from within the 2.12 fd tracker locking. The 2.13 locks always need to be taken outside of the 2.12 locks. In the scenario below, Thread 3 (lttng-ust 2.12 listener thread) takes the locks in the wrong order. Thread 3 (Thread 0x7f7679247700 (LWP 621683) "hello-ust"): #0 __lll_lock_wait (futex=futex@entry=0x7f767a392b60 <ust_safe_guard_fd_mutex>, private=0) at lowlevellock.c:52 lttng#1 0x00007f767a39d843 in __GI___pthread_mutex_lock (mutex=0x7f767a392b60 <ust_safe_guard_fd_mutex>) at ../nptl/pthread_mutex_lock.c:80 -> 2.13 locks fd tracker 2.13 lttng#2 0x00007f767a37ff82 in lttng_ust_lock_fd_tracker_orig () at fd-tracker.c:163 2.13 lttng#3 0x00007f767a380b66 in lttng_ust_safe_close_fd_orig (fd=3, close_cb=0x7f767a56b070 <__GI___close>) at fd-tracker.c:385 2.13 lttng#4 0x00007f767a6e5557 in close (fd=3) at lttng-ust-fd.c:101 2.12 lttng#5 0x00007f767a297e7d in ustcomm_connect_unix_sock (pathname=0x7f767a32bc64 <local_apps+36> "/home/compudj/.lttng/lttng-ust-sock-8", timeout=3000) at lttng-ust-comm.c:157 -> 2.12 listener thread locks fd tracker -> 2.12 listener thread locks ust_lock 2.12 lttng#6 0x00007f767a2a1f77 in ust_listener_thread (arg=0x7f767a32bc40 <local_apps>) at lttng-ust-comm.c:1591 lttng#7 0x00007f767a39aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 lttng#8 0x00007f767a57aaef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7f7678a46700 (LWP 621682) "hello-ust"): #0 __lll_lock_wait (futex=futex@entry=0x7f767a32f780 <ust_mutex>, private=0) at lowlevellock.c:52 lttng#1 0x00007f767a39d843 in __GI___pthread_mutex_lock (mutex=0x7f767a32f780 <ust_mutex>) at ../nptl/pthread_mutex_lock.c:80 2.12 lttng#2 0x00007f767a29da59 in ust_lock () at lttng-ust-comm.c:167 2.12 lttng#3 0x00007f767a2a1d95 in ust_listener_thread (arg=0x7f767a329be0 <global_apps>) at lttng-ust-comm.c:1558 lttng#4 0x00007f767a39aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 lttng#5 0x00007f767a57aaef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7f767a33a040 (LWP 621681) "hello"): #0 __lll_lock_wait (futex=futex@entry=0x7f767a32f720 <ust_safe_guard_fd_mutex>, private=0) at lowlevellock.c:52 lttng#1 0x00007f767a39d843 in __GI___pthread_mutex_lock (mutex=0x7f767a32f720 <ust_safe_guard_fd_mutex>) at ../nptl/pthread_mutex_lock.c:80 -> 2.12 lock fd tracker 2.12 lttng#2 0x00007f767a29c37c in lttng_ust_lock_fd_tracker () at lttng-ust-fd-tracker.c:153 2.12 lttng#3 0x00007f767a29ceba in lttng_ust_safe_close_fd (fd=4, close_cb=0x7f767a56b070 <__GI___close>) at lttng-ust-fd-tracker.c:341 2.13 lttng#4 0x00007f767a380b06 in lttng_ust_safe_close_fd_chain (fd=4, close_cb=0x7f767a56b070 <__GI___close>) at fd-tracker.c:348 2.13 lttng#5 0x00007f767a380b5c in lttng_ust_safe_close_fd_orig (fd=4, close_cb=0x7f767a56b070 <__GI___close>) at fd-tracker.c:379 2.13 lttng#6 0x00007f767a6e5557 in close (fd=4) at lttng-ust-fd.c:101 -> 2.13 lock fd tracker 2.13 lttng#7 0x00007f767a43b5e6 in lttng_ust_elf_destroy (elf=0x55b870f044c0) at elf.c:352 2.13 lttng#8 0x00007f767a3f2797 in get_elf_info (bin_data=0x7ffd5481ef70) at lttng-ust-statedump.c:296 2.13 lttng#9 0x00007f767a3f283f in extract_baddr (bin_data=0x7ffd5481ef70) at lttng-ust-statedump.c:319 2.13 lttng#10 0x00007f767a3f3173 in extract_bin_info_events (info=0x7ffd5481fff0, size=64, _data=0x7ffd54820098) at lttng-ust-statedump.c:518 lttng#11 0x00007f767a5b3ad5 in __GI___dl_iterate_phdr (callback=0x7f767a3f2f5a <extract_bin_info_events>, data=0x7ffd54820098) at dl-iteratephdr.c:75 2.13 lttng#12 0x00007f767a3f32c5 in lttng_ust_dl_update_orig (ip=0x7f767a3cb599 <lttng_ust_ctor+683>) at lttng-ust-statedump.c:574 2.13 lttng#13 0x00007f767a3f33f7 in lttng_ust_statedump_init () at lttng-ust-statedump.c:638 2.13 lttng#14 0x00007f767a3cb599 in lttng_ust_ctor () at lttng-ust-comm.c:2246 2.13 lttng#15 0x00007f767a3ccaad in lttng_ust_after_fork_child (restore_sigset=0x7ffd548207e0) at lttng-ust-comm.c:2577 2.13 lttng#16 0x00007f767a6f687f in fork () at ustfork.c:271 lttng#17 0x000055b8705f1034 in make_child () lttng#18 0x000055b8705daacc in execute_command_internal () lttng#19 0x000055b8705ddcf5 in execute_command () lttng#20 0x000055b8705df951 in ?? () lttng#21 0x000055b8705daf79 in execute_command_internal () lttng#22 0x000055b8705ddcf5 in execute_command () lttng#23 0x000055b8705c49db in reader_loop () lttng#24 0x000055b8705c3668 in main () Getting the state of lttng-ust 2.12 ust_fd_mutex_nest TLS variable is not straightforward, because it is a static variable (internal state). The trick used here is to rely on the "lttng_ust_safe_close_fd" symbol exported by lttng-ust 2.12. When passed fd=-1, with a close_cb which simply returns 0, we can use the resulting return values to determine the current state (zero vs nonzero) of the ust_fd_mutex_nest TLS variable without any side-effect other than temporarily taking/releasing the fd tracker lock when ust_fd_mutex_nest==0. Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I6b95e0fd607abccd67aadf35ec96c19877394b16
compudj
pushed a commit
that referenced
this pull request
Mar 30, 2023
When building the interpreter with `INTERPRETER_USE_SWITCH`, I get the following crash when interpreting a bytecode: Program terminated with signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007f5789aee443 in lttng_bytecode_interpret (ust_bytecode=0x555dfe90a650, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620, caller_ctx=0x7ffd126154bc) at lttng-bytecode-interpreter.c:885 #1 0x00007f5789af4da2 in lttng_ust_interpret_event_filter (event=0x555dfe90a580, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620, event_filter_ctx=0x0) at lttng-bytecode-interpreter.c:2548 #2 0x0000555dfe02d2d4 in lttng_ust__event_probe__tp___the_string (__tp_data=0x555dfe90a580, i=0, arg_i=2, str=0x7ffd12617cfa "hypothec") at ././tp.h:16 #3 0x0000555dfe02cac0 in lttng_ust_tracepoint_cb_tp___the_string (str=0x7ffd12617cfa "hypothec", arg_i=2, i=0) at /tmp/lttng-master/src/lttng-tools/tests/utils/testapp/gen-ust-nevents-str/tp.h:16 #4 main (argc=39, argv=0x7ffd12615818) at gen-ust-nevents-str.cpp:38 This appears to be caused by `bytecode->data` being used to determine the `start_pc` address. In my case, `data` is NULL. A quick look around the code seems to show that this member is not used except during the transmission of the bytecode. I am basing the fix on the implementation of START_OP in the default case which uses `code` in lieu of `data` and can confirm that it fixes the crash on my end. Signed-off-by: Jérémie Galarneau <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I0773df385b8e90728b60503016dec4b46d902234
compudj
pushed a commit
that referenced
this pull request
Mar 30, 2023
When building the interpreter with `INTERPRETER_USE_SWITCH`, I get the following crash when interpreting a bytecode: Program terminated with signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007f5789aee443 in lttng_bytecode_interpret (ust_bytecode=0x555dfe90a650, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620, caller_ctx=0x7ffd126154bc) at lttng-bytecode-interpreter.c:885 #1 0x00007f5789af4da2 in lttng_ust_interpret_event_filter (event=0x555dfe90a580, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620, event_filter_ctx=0x0) at lttng-bytecode-interpreter.c:2548 #2 0x0000555dfe02d2d4 in lttng_ust__event_probe__tp___the_string (__tp_data=0x555dfe90a580, i=0, arg_i=2, str=0x7ffd12617cfa "hypothec") at ././tp.h:16 #3 0x0000555dfe02cac0 in lttng_ust_tracepoint_cb_tp___the_string (str=0x7ffd12617cfa "hypothec", arg_i=2, i=0) at /tmp/lttng-master/src/lttng-tools/tests/utils/testapp/gen-ust-nevents-str/tp.h:16 #4 main (argc=39, argv=0x7ffd12615818) at gen-ust-nevents-str.cpp:38 This appears to be caused by `bytecode->data` being used to determine the `start_pc` address. In my case, `data` is NULL. A quick look around the code seems to show that this member is not used except during the transmission of the bytecode. I am basing the fix on the implementation of START_OP in the default case which uses `code` in lieu of `data` and can confirm that it fixes the crash on my end. Signed-off-by: Jérémie Galarneau <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I0773df385b8e90728b60503016dec4b46d902234
compudj
pushed a commit
that referenced
this pull request
Mar 30, 2023
When building the interpreter with `INTERPRETER_USE_SWITCH`, I get the following crash when interpreting a bytecode: Program terminated with signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007f5789aee443 in lttng_bytecode_interpret (ust_bytecode=0x555dfe90a650, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620, caller_ctx=0x7ffd126154bc) at lttng-bytecode-interpreter.c:885 #1 0x00007f5789af4da2 in lttng_ust_interpret_event_filter (event=0x555dfe90a580, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620, event_filter_ctx=0x0) at lttng-bytecode-interpreter.c:2548 #2 0x0000555dfe02d2d4 in lttng_ust__event_probe__tp___the_string (__tp_data=0x555dfe90a580, i=0, arg_i=2, str=0x7ffd12617cfa "hypothec") at ././tp.h:16 #3 0x0000555dfe02cac0 in lttng_ust_tracepoint_cb_tp___the_string (str=0x7ffd12617cfa "hypothec", arg_i=2, i=0) at /tmp/lttng-master/src/lttng-tools/tests/utils/testapp/gen-ust-nevents-str/tp.h:16 #4 main (argc=39, argv=0x7ffd12615818) at gen-ust-nevents-str.cpp:38 This appears to be caused by `bytecode->data` being used to determine the `start_pc` address. In my case, `data` is NULL. A quick look around the code seems to show that this member is not used except during the transmission of the bytecode. I am basing the fix on the implementation of START_OP in the default case which uses `code` in lieu of `data` and can confirm that it fixes the crash on my end. Signed-off-by: Jérémie Galarneau <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Change-Id: I0773df385b8e90728b60503016dec4b46d902234
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Integer
objects are immutable in Java, sodoes not update the value in
enabledLoggers
.