-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profiler startup command to the diagnostics command set #52175
Conversation
Tagging subscribers to this area: @tommcdon Issue DetailsThis PR adds the ability for a profiler to register itself as a startup profiler over the diagnostics pipe. The only place where this is legal is in the "reverse server" startup mode, otherwise profilers will have to use the existing attach mechanism. It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration. @josalem @noahfalk while testing this I ran in to some issues that were fairly unintuitive (or maybe even bugs). When first writing the test I expected to set up a reverse server, have the runtime connect to it and then be able to issue a startup profiler command and then a resume runtime command on the same session, but the runtime forcibly disconnects the stream after each command. I did not expect that, and would have expected I could issue as many commands as I wanted and then close the connection myself when I was through. Next I changed the test to start a reverse server, then while sitting with the reverse server stream open I have to spin up another IPC call and do the profiler attach. But then the runtime wouldn't actually resume after sending the resume runtime command, either from the original reverse server stream or the new command stream. After a little debugging I found out that it's because of this check which will not resume the runtime if there are any idle connections around: runtime/src/native/eventpipe/ds-server.c Lines 272 to 277 in d2fba8f
In order to resume the runtime I have to issue the resume runtime command via both streams. It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them. CC @shirhatti
|
@jander-msft FYI |
The IPC protocol is designed for every command to be a new connection. This is because any of those commands could have an Optional Continuation that would consume the connection. As such, you can't reuse a connection stream once you've exchanged a command to completion. If you are connecting to a Listen mode DiagnosticPort (tool->runtime), then you need to open a new connection to send another command. If you are listening for a Connect mode DiagnosticPort (runtime->tool), then you need to wait for the runtime to connect back to the tool to issue the next command (this should happen as soon as you issue the first command). As for the resume command requiring every DiagnosticPort that is configured for suspension: that's by design. The expectation is that if user A configures a diagnostic port to suspend and user B configures another DiagnosticPort for suspension, then each user presumably had some command they wanted issued before the runtime resumes. Therefore, we don't resume until all DiagnosticPorts that are configured to suspend have had a resume command sent. |
// This is to work around having to wait for an update to the DiagnosticsClient nuget before adding | ||
// a test. I really hope this isn't permanent | ||
// TODO: remove all this code when DiagnosticsClient version is updated... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been maintaining a test harness version of the IPC client code that is simpler than the library and allows you to inspect and assert on partial responses: https://github.com/dotnet/runtime/blob/main/src/tests/tracing/eventpipe/common/IpcUtils.cs.
Example use in situ:
runtime/src/tests/tracing/eventpipe/diagnosticport/diagnosticport.cs
Lines 283 to 316 in 16db832
public static async Task<bool> TEST_AdvertiseAndProcessInfoCookiesMatch() | |
{ | |
bool fSuccess = true; | |
string serverName = ReverseServer.MakeServerAddress(); | |
Logger.logger.Log($"Server name is '{serverName}'"); | |
var server = new ReverseServer(serverName); | |
using var memoryStream = new MemoryStream(); | |
Task<bool> subprocessTask = Utils.RunSubprocess( | |
currentAssembly: Assembly.GetExecutingAssembly(), | |
environment: new Dictionary<string,string> { { Utils.DiagnosticPortsEnvKey, $"{serverName},nosuspend" } }, | |
duringExecution: async (pid) => | |
{ | |
Stream stream = await server.AcceptAsync(); | |
IpcAdvertise advertise = IpcAdvertise.Parse(stream); | |
Logger.logger.Log(advertise.ToString()); | |
Logger.logger.Log($"Send ProcessInfo Diagnostics IPC Command"); | |
// send ProcessInfo command (0x04=ProcessCommandSet, 0x00=ProcessInfo commandid) | |
var message = new IpcMessage(0x04,0x00); | |
Logger.logger.Log($"Sent: {message.ToString()}"); | |
IpcMessage response = IpcClient.SendMessage(stream, message); | |
Logger.logger.Log($"received: {response.ToString()}"); | |
ProcessInfo info = ProcessInfo.TryParse(response.Payload); | |
Logger.logger.Log($"ProcessInfo: {{ id={info.ProcessId}, cookie={info.RuntimeCookie}, cmdline={info.Commandline}, OS={info.OS}, arch={info.Arch} }}"); | |
Utils.Assert(info.RuntimeCookie.Equals(advertise.RuntimeInstanceCookie), $"The runtime cookie reported by ProcessInfo and Advertise must match. ProcessInfo: {info.RuntimeCookie.ToString()}, Advertise: {advertise.RuntimeInstanceCookie.ToString()}"); | |
Logger.logger.Log($"ProcessInfo and Advertise Cookies are equal"); | |
} | |
); | |
fSuccess &= await subprocessTask; | |
return fSuccess; | |
} |
I'm not sure we want to tie runtime testing to the client library. We would run into issues like this where new commands don't exist in the library and won't exist until the version is updated. Up till now, I've been using the simplified harness to test commands, and only use the client library for event streams. I'm open to changing that arrangement, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm actually using the ReverseServer code from that test harness in my test. I would prefer to test our shipping code if possible, but the nuget depedency adds a real wrinkle. Maybe I should just accept that it's too burdensome to wait for a nuget update and use the mock IPC tools for everything.
It looks like I introduced a double free that makes other eventing tests fail, I will look at it tonight or tomorrow. |
If this new way to enabling a profiler on the process only allows for specifying the profiler path and guid, then I feel like its a bit limiting regarding adding additional profiler configuration:
I understand that specifying the profiler while the runtime is suspended due the diagnostic port configuration is intended to act as profiling at startup (which is great, because that allows for the full range of profiling capabilities rather than what's only available at attach). However, I think there needs to be a way of providing the additional configuration information given that the process is already running (and can be running in a completely different process namespace as that of the tool that is hosting the diagnostic port listener). I don't know what the correct approach to this is, but either having "client data" notion from ICorProfilerCallback3:InitializeForAttach would work (although, I don't think that would really fit into the existing callback interfaces correctly without invoking the InitializeForAttach method even though its not attach) or maybe adding another command the diagnostic pipe that allows for setting environment variables from outside the process. An alternative approach could be to rewrite the profilers to load information from file system, but that requires (1) modifying profilers for this new attach-at-startup hybrid and (2) the tool hosting the diagnostic port listener might not have file system access in a way that allows clean separation of the path at which diagnostic ports are bound vs paths for profiler configuration. |
ds_ipc_stream_free (stream); | ||
ep_raise_error (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the double free you are hitting, but this will lead to a double free.
ep_raise_error ()
=> goto ep_on_error
which ends with ep_exit_error_handler ()
=> goto ep_on_exit
which also calls ds_ipc_stream_free (stream)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was it, the failing test was using the profiler attach command in reverse startup mode and verifying it failed, which would have hit this code path.
Thanks for that feedback, I hadn't really thought about the configuration aspect.
I also agree that the filesystem is probably not the way to go, it is more complicated than it needs to be and has access issues in certain environments. There are two approaches that seem like a good approach to me:
I like item 2 better. We could add a command on the "process" command set. I would lean towards having the command set a single environment variable and then you can call it as many times as you need to. Unless you need to set hundreds of environment variables I don't think the perf of multiple calls will matter, and I want to avoid having to pass strings like "env1=value1;env2=value2" because then you have to deal with quoting any of special characters. |
The one thing worth calling out with option 2 is that the runtime caches environment variables at startup on non-Windows platforms due to thread safety issues, so if we went with option 2 then you wouldn't be able to retrieve values with |
An alternative to modifying the environment might be to stash configuration with the IPC command payload that registers the startup profiler. There'd need to be some extra bookkeeping to make sure the correct config was routed to the profiler when it loads, though. When the profiler loads, you just pass in the attached config to the ICorProfilerCallback3:InitializeForAttach method. |
I wouldn't want to confuse attach and startup by reusing the InitializeForAttach callback. If we go the "pass a configuration like attach" route, then we should introduce a new callback that is Initialize2 (or InitialzieForStartup) and passes in the configuration like attach. My vote is still the environment variable command. @jander-msft can you let me know if my proposal works for your needs before I spend time implementing it? |
@josalem already mentioned that the behavior is by design, but I do think its worth addressing from a standpoint that if a given design leads to a bad experience then it calls into question whether the design itself is good. I think you were using protocol at a lower layer of abstraction and other users aren't going to see the same behavior from their POV but I want to make sure everyone agrees and we aren't just sweeping it under the rug. To address the two specific concerns:
var listener = new DiagnosticPortListener(portName);
server.StartListening( runtimeEndpoint =>
{
DiagnosticClient client = new DiagnosticClient(runtimeEndpoint);
client.SetEnvVar(...); // behind the scenes these calls are re-establishing a new connection each time
client.SetEnvVar(...);
client.StartupProfiler(...);
client.ResumeRuntime();
});
|
I apologize for not getting back to you sooner. This proposal seems good to me; it should be roughly equivalent to setting up the profiler and its specific settings via environment variables so that it can load from startup. I've also notified the CLR Instrumentation Engine owners in case they wanted to chime in or ask questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you were planning to add the environment stuff here or separately, but either way looks good to me : )
I was originally planning on doing the environment variable stuff on this PR, but it's probably better to create a new PR so I'll follow up with that one. |
@noahfalk @josalem how likely do we think it is that people will implement the IPC protocol without using our DiagnosticClient library? We already know that the OpenTelemetry guys implemented their own, I suspect other profiler vendors might be inclined to write their own library given how we keep adding profiler stuff to it. It may be worth thinking about users of the lower level pipe too. That being said, it works how it is and I don't think it's a terrible user experience. I would just like it better if I could open one connection. I don't think it's bad enough to try and make us change it when we've already shipped it. |
Like you mentioned, we already have people individuals and teams implementing their own Diagnostics IPC clients and EventPipe parsers. That being said, I don't anticipate many people will build their own. At some point I would like to build a couple sample/reference implementations in some targeted other languages like Java, Go, and Python for instance. That would make it easier for external tools to interact with dotnet processes. For now, I think we are probably fine to bolster our documentation of the protocol and react to feedback as it comes in. |
This PR adds the ability for a profiler to register itself as a startup profiler over the diagnostics pipe. The only place where this is legal is in the "reverse server" startup mode, otherwise profilers will have to use the existing attach mechanism.
It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.
@josalem @noahfalk while testing this I ran in to some issues that were fairly unintuitive (or maybe even bugs). When first writing the test I expected to set up a reverse server, have the runtime connect to it and then be able to issue a startup profiler command and then a resume runtime command on the same session, but the runtime forcibly disconnects the stream after each command. I did not expect that, and would have expected I could issue as many commands as I wanted and then close the connection myself when I was through.
Next I changed the test to start a reverse server, then while sitting with the reverse server stream open I have to spin up another IPC call and do the profiler attach. But then the runtime wouldn't actually resume after sending the resume runtime command, either from the original reverse server stream or the new command stream. After a little debugging I found out that it's because of this check which will not resume the runtime if there are any idle connections around:
runtime/src/native/eventpipe/ds-server.c
Lines 272 to 277 in d2fba8f
In order to resume the runtime I have to issue the resume runtime command via both streams.
It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.
CC @shirhatti