Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include tracing headers in hot_threads output? #74580

Open
jtibshirani opened this issue Jun 24, 2021 · 7 comments
Open

Include tracing headers in hot_threads output? #74580

jtibshirani opened this issue Jun 24, 2021 · 7 comments
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team

Comments

@jtibshirani
Copy link
Contributor

jtibshirani commented Jun 24, 2021

When searches are causing high CPU load, it's common to consult hot_threads to find the source of expensive searches. Working backwards to the source can be difficult, as it requires detailed knowledge of how searches are executed, often at the Lucene level.

We support a special header X-Opaque-Id that allows clients to tag a request with context information. Kibana is currently working on passing X-Opaque-Id on all search requests, to surface the application and component that initiated the request (elastic/kibana#101587). The header is already included in search slow logs and tasks output. Could we also add it to hot_threads to help debug the source of expensive queries?

Note: I'm not familiar with the hot_threads implementation and am not sure how feasible this actually is.

@jtibshirani jtibshirani added >enhancement :Core/Infra/Core Core issues without another label labels Jun 24, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@rdnm
Copy link

rdnm commented Jul 1, 2021

Thanks @jtibshirani for the write up. @pgomulka is going to spend a bit of time investigating how we would implement this, and we'll discuss at the next core/infra team meeting.

@pgomulka
Copy link
Contributor

pgomulka commented Jul 8, 2021

I am not sure we can achieve this with current infrastructure. Accessing X-Opaque-ID from other threads would require a public access to ThreadLocal.getMap(Thread t) or Thread.threadLocals - which are package private to java.lang package. I am also not sure that would be a great design anyway to access foreign threads private parts..

@original-brownbear and @ywelsch Do you think this is achievable with ThreadContext? Or maybe you think there might be a totally different approach to this?

@ywelsch
Copy link
Contributor

ywelsch commented Jul 8, 2021

You would also first have to get ahold of the ThreadContext associated with the thread. And you would have to get ahold of the Thread object, as HotThreads is only operating on ThreadInfo, not Thread. From a quick look, the simplest way to implement something along the lines of what @jtibshirani asked would be to create a static concurrent map in ThreadContext, keyed by thread id (long Thread.getId()), update it whenever ThreadContext.threadLocal is updated, and then get the relevant information in HotThreads by doing a lookup on this map using ThreadInfo.getThreadId.

@grcevski
Copy link
Contributor

grcevski commented Sep 7, 2021

Hi @ywelsch, I had a chat with @pgomulka on this today and I'm wondering if it would be possible to avoid creating a shared concurrent map by modifying the Thread names when we stash and restore the ThreadContext? Essentially, all we really need to do is show the X-Opaque-ID in the thread dump and we can use the Thread name field for it. Right now a thread name would be something like: elasticsearch[elastic][generic][T#2], which comes from the name of the thread pool I think, so the idea is to:

  • Save the original thread name on context stash into a thread local
  • Call Thread.currentThread().setName() and rename the current worker thread name to elasticsearch[elastic][generic][T#2]: X-Opaque-ID: 123
  • Put back the original name on context restore

I'm new here so I may not understand the full scope of this kind of a change, but I can see some advantages if it was possible. We wouldn't have to run additional atomic operations of every request for an infrequent API call, HotThreads doesn't need to be modified and having the X-Opaque-ID in the thread name may come in handy during debugging, or while using other external tools like JFR/MissionControl or VisualVM.

@ywelsch
Copy link
Contributor

ywelsch commented Sep 8, 2021

Note that hot_threads is more than just a thread dump. It takes multiple samples for each thread in order to identify common parts in the stack traces over a period of time for a given thread. HotThreads currently assumes that the thread name is stable, just picking the name from one sample. If X-Opaque-ID is encoded into the thread name, and it changes from sample to sample, couldn't this now mislead users into thinking that the thread had spent all this effort in the context of a given opaque id whereas this was only applicable to a (small) subset of the samples?

Also, wouldn't the rename require you to parse the current thread name first, e.g. in order to find out what thread pool it belonged to (write/generic/...) so that you can emit the correct updated name?

I agree that having the X-Opaque-ID in the thread name for debugging / external tools could be handy as well (unless these tools operate on multiple samples as well and assume a stable thread name to identify identical threads). I'm not sure what a good practice here is. I don't hold a strong opinion on this either way.

@grcevski
Copy link
Contributor

grcevski commented Sep 8, 2021

Note that hot_threads is more than just a thread dump. It takes multiple samples for each thread in order to identify common parts in the stack traces over a period of time for a given thread. HotThreads currently assumes that the thread name is stable, just picking the name from one sample. If X-Opaque-ID is encoded into the thread name, and it changes from sample to sample, couldn't this now mislead users into thinking that the thread had spent all this effort in the context of a given opaque id whereas this was only applicable to a (small) subset of the samples?

This is a very valid point, thanks for bringing it up. We would have to extract all X-Opaque-Ids as we capture the thread stacks from the thread name and clean-up the opaque id from the thread name when we report it. I think it's similar to what we would need to do if we used a map, but with less String manipulation and clean-up.

In a sense, if we implemented this feature it would have to be taken with a grain of salt anyway. If there were 10, 50ms API calls on the same executor thread, for the default duration that the HotThreads sampler thread sleeps (500ms), we'd miss at least 9 of those Opaque-Ids with both solutions we have here. The only way to get 100% accurate would be via sliding time window id tracking in a list (per thread), with 1 second window, assuming nobody uses more than 500ms HotThreads interval. I think that kind of tracking would be very expensive inside ThreadContext.

In short, I think with short running requests we'll always be misleading the users on the X-Opaque-Id. Perhaps the solution is to word the messaging around the Id in the report so that it speaks to this potential uncertainty?

Also, wouldn't the rename require you to parse the current thread name first, e.g. in order to find out what thread pool it belonged to (write/generic/...) so that you can emit the correct updated name?

My thinking here was to simply append the X-Opaque-Id to whatever the thread name already is. By default it's set by the pool, so we'd add to it. However, now that you mention this, I see a way to get around having to store/restore the original thread name in a ThreadLocal, by doing String.split(threadName, 'X-Opaque-Id') on the restore path and simply reverting the name to the first String in the resulting array.

grcevski pushed a commit to grcevski/elasticsearch that referenced this issue Sep 15, 2021
Elasticsearch clients can supply X-Opaque-Id or trace.id tracing headers
in requests to help with debugging issues. This change adds any tracing
headers into the thread name for the duration of the request, such
that this information can be visible in HotThreads or while using other
profiling tools (e.g. JFR/Mission Control, VisualVM, etc. ).

Closes elastic#74580
@rdnm rdnm removed the team-discuss label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants