Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiler startup command to the diagnostics command set #52175

Merged
merged 12 commits into from
May 7, 2021

Conversation

davmason
Copy link
Member

@davmason davmason commented May 2, 2021

This PR adds the ability for a profiler to register itself as a startup profiler over the diagnostics pipe. The only place where this is legal is in the "reverse server" startup mode, otherwise profilers will have to use the existing attach mechanism.

It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.

@josalem @noahfalk while testing this I ran in to some issues that were fairly unintuitive (or maybe even bugs). When first writing the test I expected to set up a reverse server, have the runtime connect to it and then be able to issue a startup profiler command and then a resume runtime command on the same session, but the runtime forcibly disconnects the stream after each command. I did not expect that, and would have expected I could issue as many commands as I wanted and then close the connection myself when I was through.

Next I changed the test to start a reverse server, then while sitting with the reverse server stream open I have to spin up another IPC call and do the profiler attach. But then the runtime wouldn't actually resume after sending the resume runtime command, either from the original reverse server stream or the new command stream. After a little debugging I found out that it's because of this check which will not resume the runtime if there are any idle connections around:

ds_server_resume_runtime_startup (void)
{
ds_ipc_stream_factory_resume_current_port ();
if (!ds_ipc_stream_factory_any_suspended_ports () && ep_rt_wait_event_is_valid (&_server_resume_runtime_startup_event))
ep_rt_wait_event_set (&_server_resume_runtime_startup_event);
}

In order to resume the runtime I have to issue the resume runtime command via both streams.

It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.

CC @shirhatti

@davmason davmason added this to the 6.0.0 milestone May 2, 2021
@davmason davmason requested review from lambdageek and a team May 2, 2021 19:28
@davmason davmason requested a review from marek-safar as a code owner May 2, 2021 19:28
@davmason davmason self-assigned this May 2, 2021
@ghost
Copy link

ghost commented May 2, 2021

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds the ability for a profiler to register itself as a startup profiler over the diagnostics pipe. The only place where this is legal is in the "reverse server" startup mode, otherwise profilers will have to use the existing attach mechanism.

It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.

@josalem @noahfalk while testing this I ran in to some issues that were fairly unintuitive (or maybe even bugs). When first writing the test I expected to set up a reverse server, have the runtime connect to it and then be able to issue a startup profiler command and then a resume runtime command on the same session, but the runtime forcibly disconnects the stream after each command. I did not expect that, and would have expected I could issue as many commands as I wanted and then close the connection myself when I was through.

Next I changed the test to start a reverse server, then while sitting with the reverse server stream open I have to spin up another IPC call and do the profiler attach. But then the runtime wouldn't actually resume after sending the resume runtime command, either from the original reverse server stream or the new command stream. After a little debugging I found out that it's because of this check which will not resume the runtime if there are any idle connections around:

ds_server_resume_runtime_startup (void)
{
ds_ipc_stream_factory_resume_current_port ();
if (!ds_ipc_stream_factory_any_suspended_ports () && ep_rt_wait_event_is_valid (&_server_resume_runtime_startup_event))
ep_rt_wait_event_set (&_server_resume_runtime_startup_event);
}

In order to resume the runtime I have to issue the resume runtime command via both streams.

It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.

CC @shirhatti

Author: davmason
Assignees: davmason
Labels:

area-Diagnostics-coreclr

Milestone: 6.0.0

@shirhatti
Copy link
Contributor

@jander-msft FYI

@josalem
Copy link
Contributor

josalem commented May 3, 2021

expected I could issue as many commands as I wanted and then close the connection myself when I was through

The IPC protocol is designed for every command to be a new connection. This is because any of those commands could have an Optional Continuation that would consume the connection. As such, you can't reuse a connection stream once you've exchanged a command to completion. If you are connecting to a Listen mode DiagnosticPort (tool->runtime), then you need to open a new connection to send another command. If you are listening for a Connect mode DiagnosticPort (runtime->tool), then you need to wait for the runtime to connect back to the tool to issue the next command (this should happen as soon as you issue the first command).

As for the resume command requiring every DiagnosticPort that is configured for suspension: that's by design. The expectation is that if user A configures a diagnostic port to suspend and user B configures another DiagnosticPort for suspension, then each user presumably had some command they wanted issued before the runtime resumes. Therefore, we don't resume until all DiagnosticPorts that are configured to suspend have had a resume command sent.

Comment on lines +10 to +12
// This is to work around having to wait for an update to the DiagnosticsClient nuget before adding
// a test. I really hope this isn't permanent
// TODO: remove all this code when DiagnosticsClient version is updated...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been maintaining a test harness version of the IPC client code that is simpler than the library and allows you to inspect and assert on partial responses: https://github.com/dotnet/runtime/blob/main/src/tests/tracing/eventpipe/common/IpcUtils.cs.

Example use in situ:

public static async Task<bool> TEST_AdvertiseAndProcessInfoCookiesMatch()
{
bool fSuccess = true;
string serverName = ReverseServer.MakeServerAddress();
Logger.logger.Log($"Server name is '{serverName}'");
var server = new ReverseServer(serverName);
using var memoryStream = new MemoryStream();
Task<bool> subprocessTask = Utils.RunSubprocess(
currentAssembly: Assembly.GetExecutingAssembly(),
environment: new Dictionary<string,string> { { Utils.DiagnosticPortsEnvKey, $"{serverName},nosuspend" } },
duringExecution: async (pid) =>
{
Stream stream = await server.AcceptAsync();
IpcAdvertise advertise = IpcAdvertise.Parse(stream);
Logger.logger.Log(advertise.ToString());
Logger.logger.Log($"Send ProcessInfo Diagnostics IPC Command");
// send ProcessInfo command (0x04=ProcessCommandSet, 0x00=ProcessInfo commandid)
var message = new IpcMessage(0x04,0x00);
Logger.logger.Log($"Sent: {message.ToString()}");
IpcMessage response = IpcClient.SendMessage(stream, message);
Logger.logger.Log($"received: {response.ToString()}");
ProcessInfo info = ProcessInfo.TryParse(response.Payload);
Logger.logger.Log($"ProcessInfo: {{ id={info.ProcessId}, cookie={info.RuntimeCookie}, cmdline={info.Commandline}, OS={info.OS}, arch={info.Arch} }}");
Utils.Assert(info.RuntimeCookie.Equals(advertise.RuntimeInstanceCookie), $"The runtime cookie reported by ProcessInfo and Advertise must match. ProcessInfo: {info.RuntimeCookie.ToString()}, Advertise: {advertise.RuntimeInstanceCookie.ToString()}");
Logger.logger.Log($"ProcessInfo and Advertise Cookies are equal");
}
);
fSuccess &= await subprocessTask;
return fSuccess;
}

I'm not sure we want to tie runtime testing to the client library. We would run into issues like this where new commands don't exist in the library and won't exist until the version is updated. Up till now, I've been using the simplified harness to test commands, and only use the client library for event streams. I'm open to changing that arrangement, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm actually using the ReverseServer code from that test harness in my test. I would prefer to test our shipping code if possible, but the nuget depedency adds a real wrinkle. Maybe I should just accept that it's too burdensome to wait for a nuget update and use the mock IPC tools for everything.

src/tests/profiler/common/ProfilerTestRunner.cs Outdated Show resolved Hide resolved
@davmason
Copy link
Member Author

davmason commented May 3, 2021

It looks like I introduced a double free that makes other eventing tests fail, I will look at it tonight or tomorrow.

@jander-msft
Copy link
Member

It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.

If this new way to enabling a profiler on the process only allows for specifying the profiler path and guid, then I feel like its a bit limiting regarding adding additional profiler configuration:

  • For traditional profiling at startup, the profilers that I have worked typically use environment variables for additional configuration. The profiler understands that it is starting at process startup and thus searches for environment variables for the additional configuration.
  • For traditional profiler attach, the profilers that I have worked on typically get additional configuration information via ICorProfilerCallback3:InitializeForAttach. The profile has some way of deserializing that information and the client has to serialize the information into the format the profiler understands.

I understand that specifying the profiler while the runtime is suspended due the diagnostic port configuration is intended to act as profiling at startup (which is great, because that allows for the full range of profiling capabilities rather than what's only available at attach). However, I think there needs to be a way of providing the additional configuration information given that the process is already running (and can be running in a completely different process namespace as that of the tool that is hosting the diagnostic port listener).

I don't know what the correct approach to this is, but either having "client data" notion from ICorProfilerCallback3:InitializeForAttach would work (although, I don't think that would really fit into the existing callback interfaces correctly without invoking the InitializeForAttach method even though its not attach) or maybe adding another command the diagnostic pipe that allows for setting environment variables from outside the process.

An alternative approach could be to rewrite the profilers to load information from file system, but that requires (1) modifying profilers for this new attach-at-startup hybrid and (2) the tool hosting the diagnostic port listener might not have file system access in a way that allows clean separation of the path at which diagnostic ports are bound vs paths for profiler configuration.

Comment on lines 117 to 118
ds_ipc_stream_free (stream);
ep_raise_error ();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the double free you are hitting, but this will lead to a double free.

ep_raise_error () => goto ep_on_error which ends with ep_exit_error_handler () => goto ep_on_exit which also calls ds_ipc_stream_free (stream).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was it, the failing test was using the profiler attach command in reverse startup mode and verifying it failed, which would have hit this code path.

@davmason
Copy link
Member Author

davmason commented May 4, 2021

If this new way to enabling a profiler on the process only allows for specifying the profiler path and guid, then I feel like its a bit limiting regarding adding additional profiler configuration:

Thanks for that feedback, I hadn't really thought about the configuration aspect.

I don't know what the correct approach to this is, but either having "client data" notion from ICorProfilerCallback3:InitializeForAttach would work (although, I don't think that would really fit into the existing callback interfaces correctly without invoking the InitializeForAttach method even though its not attach) or maybe adding another command the diagnostic pipe that allows for setting environment variables from outside the process.

An alternative approach could be to rewrite the profilers to load information from file system, but that requires (1) modifying profilers for this new attach-at-startup hybrid and (2) the tool hosting the diagnostic port listener might not have file system access in a way that allows clean separation of the path at which diagnostic ports are bound vs paths for profiler configuration.

I also agree that the filesystem is probably not the way to go, it is more complicated than it needs to be and has access issues in certain environments. There are two approaches that seem like a good approach to me:

  1. Add a Initialize2 method that passes client data like InitializeForAttach, and add a client data param to the startup profiler command. If the profiler doesn't implement the client data callback interface we just don't pass them the data.
  2. (Same idea as yours) add an IPC command that adds environment variable(s) to the process

I like item 2 better. We could add a command on the "process" command set. I would lean towards having the command set a single environment variable and then you can call it as many times as you need to. Unless you need to set hundreds of environment variables I don't think the perf of multiple calls will matter, and I want to avoid having to pass strings like "env1=value1;env2=value2" because then you have to deal with quoting any of special characters.

@davmason
Copy link
Member Author

davmason commented May 4, 2021

The one thing worth calling out with option 2 is that the runtime caches environment variables at startup on non-Windows platforms due to thread safety issues, so if we went with option 2 then you wouldn't be able to retrieve values with getenv(), you'd have to use the ICorProfilerInfo*::GetEnvironmentVariable to get the values.

@josalem
Copy link
Contributor

josalem commented May 4, 2021

An alternative to modifying the environment might be to stash configuration with the IPC command payload that registers the startup profiler. There'd need to be some extra bookkeeping to make sure the correct config was routed to the profiler when it loads, though. When the profiler loads, you just pass in the attached config to the ICorProfilerCallback3:InitializeForAttach method.

@davmason
Copy link
Member Author

davmason commented May 4, 2021

I wouldn't want to confuse attach and startup by reusing the InitializeForAttach callback. If we go the "pass a configuration like attach" route, then we should introduce a new callback that is Initialize2 (or InitialzieForStartup) and passes in the configuration like attach.

My vote is still the environment variable command. @jander-msft can you let me know if my proposal works for your needs before I spend time implementing it?

@noahfalk
Copy link
Member

noahfalk commented May 6, 2021

It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.

@josalem already mentioned that the behavior is by design, but I do think its worth addressing from a standpoint that if a given design leads to a bad experience then it calls into question whether the design itself is good. I think you were using protocol at a lower layer of abstraction and other users aren't going to see the same behavior from their POV but I want to make sure everyone agrees and we aren't just sweeping it under the rug. To address the two specific concerns:

  1. Needing to create a new connection per-command was counter-intuitive. For users using the DiagnosticClient API I expect they will be unaware that these reconnections are necessary because the client implementation is doing all the work for them. They would write some code like this (handwaving a bit) and it would 'just work':
var listener = new DiagnosticPortListener(portName);
server.StartListening( runtimeEndpoint =>
{
   DiagnosticClient client = new DiagnosticClient(runtimeEndpoint);
   client.SetEnvVar(...); // behind the scenes these calls are re-establishing a new connection each time
   client.SetEnvVar(...);
   client.StartupProfiler(...);
   client.ResumeRuntime();
});
  1. Needing to call resume once for each port is onerous - I'd expect the standard use-case is that users only create one port so they won't see this challenge. I suspect you created multiple ports because it reduced the complexity of the re-establishing connections but it should be unnecessary (we should confirm that is). The example above shows a bunch of sequential commands on the same connection but it should also be possible to have multiple long-running commands in parallel too. The runtime should automatically initiate a new connection as soon as it receives the command from the tool without waiting for it to complete.

@jander-msft
Copy link
Member

My vote is still the environment variable command. @jander-msft can you let me know if my proposal works for your needs before I spend time implementing it?

I apologize for not getting back to you sooner. This proposal seems good to me; it should be roughly equivalent to setting up the profiler and its specific settings via environment variables so that it can load from startup. I've also notified the CLR Instrumentation Engine owners in case they wanted to chime in or ask questions.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you were planning to add the environment stuff here or separately, but either way looks good to me : )

@davmason davmason merged commit f511a4a into dotnet:main May 7, 2021
@davmason
Copy link
Member Author

davmason commented May 7, 2021

I was originally planning on doing the environment variable stuff on this PR, but it's probably better to create a new PR so I'll follow up with that one.

@davmason
Copy link
Member Author

davmason commented May 7, 2021

@noahfalk @josalem how likely do we think it is that people will implement the IPC protocol without using our DiagnosticClient library? We already know that the OpenTelemetry guys implemented their own, I suspect other profiler vendors might be inclined to write their own library given how we keep adding profiler stuff to it. It may be worth thinking about users of the lower level pipe too.

That being said, it works how it is and I don't think it's a terrible user experience. I would just like it better if I could open one connection. I don't think it's bad enough to try and make us change it when we've already shipped it.

@josalem
Copy link
Contributor

josalem commented May 7, 2021

how likely do we think it is that people will implement the IPC protocol without using our DiagnosticClient library?

Like you mentioned, we already have people individuals and teams implementing their own Diagnostics IPC clients and EventPipe parsers. That being said, I don't anticipate many people will build their own. At some point I would like to build a couple sample/reference implementations in some targeted other languages like Java, Go, and Python for instance. That would make it easier for external tools to interact with dotnet processes. For now, I think we are probably fine to bolster our documentation of the protocol and react to feedback as it comes in.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants