Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support stack traces of all threads #244

Open
pmalhaire opened this issue Nov 17, 2021 · 8 comments
Open

Support stack traces of all threads #244

pmalhaire opened this issue Nov 17, 2021 · 8 comments

Comments

@pmalhaire
Copy link

Hello,

Your tool is the best we could find for getting the stack trace of one thread,
is there a way, even partial, to get the stack traces of all threads ?

This will be for us the feature that will make this project the best tool above all.
I am willing to help for the Linux part.
It's lightweight, easy to implement, and the code is readable, thank you for this awsome project.

@pmalhaire pmalhaire changed the title Support stack trace of all threads Support stack traces of all threads Nov 17, 2021
@bkietz
Copy link

bkietz commented Aug 8, 2022

@bombela I wrote https://gist.github.com/bkietz/9e72ff72d58c6f9d977845c39fd63a21 as an example of how one could accomplish this under pthreads. I'm not sure if there's a more simultaneous/less hacky way to get all threads to stop; gdb seems to use signals to the non-segfaulted threads in a similar fashion. Is this an implementation you'd like a PR for? If so, any guidance on how you would like it structured in backward::?

@bkietz
Copy link

bkietz commented Aug 9, 2022

OTOH, having written that I think the correct solution to the all-threads-trace problem is probably to allow the process to core dump then reading stacks out of that. This has two advantages over in-process tracing:

  • When a signal handler exists, the non-signaled threads continue execution until they receive signals of their own. However if a signal is known to be fatal, the OS can shut threads down more aggressively- this means it is possible to can get less out-of-date traces from the threads which didn't segfault than would be possible with interthread signals
  • We'd probably be reading the core dump with gdb or another debugger and we'd have access to the process' full memory, so we could print not just snippets of the source files but values of local variables as well

So maybe this should be closed as out of scope for backward.

@pmalhaire
Copy link
Author

@bkietz sure it's close to out of scope feature, but if it could be done in a clean maner even perhaps using a complementary repo it would be a killer feature.

@bombela
Copy link
Owner

bombela commented Aug 10, 2022

I don't mind the addition of a "pthread.hpp" file or something like that. And overtime it could even morph into a cross platform solution "threads.hpp".

You would include this extra file in your project only if you want it.

@bombela
Copy link
Owner

bombela commented Aug 10, 2022

As for the proposed implementation @bkietz, it must possible to enumerate the threads via the OS. After all, ps and top can do it!

@bkietz
Copy link

bkietz commented Aug 15, 2022

It's definitely possible; GDB does it by reading procfs. I intentionally avoided this because doing so adds more syscall delay between the first signal and signalling the other threads, which degrades the quality of the traces from other threads. A manual table of pthread_t is more work but lets you get straight to signaling. If the boilerplate is intolerable we could read procfs instead (unless you know of a faster way to enumerate threads?).

@bombela
Copy link
Owner

bombela commented Aug 15, 2022

The manual table also limits you to the threads that you control directly. So threads created by a library won't be visible unless the library happens to also use the same registry (including the same ABI).

Threads that are short lived cannot be registered. Because after the thread terminates. The thread ID could be reused as the pthread documentation describes.

So threads that terminate should be unregistered.

I don't think there is any other way than procfs for listing them all. And that wouldn't be atomic and too slow as you said.

So... a registry with some ways to register in start and deregister on thread termination seems to be the way to go.

I guess its always possible to override pthread_create and wrap the function to execute with a push/pop cleanup to deregister on termination. See https://linux.die.net/man/3/pthread_cleanup_push

@bkietz
Copy link

bkietz commented Aug 19, 2022

Yet another option would be to look even more like gdb: provide a helper to be executed (very) early in main() which calls vfork and ptrace. The tracing process watches for new/exiting threads (potentially child processes too, needs more thought) and maintains the listing of what needs a tgkill. This introduces some overhead due to context switching into the tracing process when the initial signal is received and couldn't be used at the same time as a debugger (since only one process may ptrace another). Also would require some doc for use in containers since some (docker at least) forbid ptrace calls by default. Still, seemed worth mentioning on the strength of "one line addition for consumers"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants