Add profiler startup command to the diagnostics command set #52175

davmason · 2021-05-02T19:28:03Z

This PR adds the ability for a profiler to register itself as a startup profiler over the diagnostics pipe. The only place where this is legal is in the "reverse server" startup mode, otherwise profilers will have to use the existing attach mechanism.

It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.

@josalem @noahfalk while testing this I ran in to some issues that were fairly unintuitive (or maybe even bugs). When first writing the test I expected to set up a reverse server, have the runtime connect to it and then be able to issue a startup profiler command and then a resume runtime command on the same session, but the runtime forcibly disconnects the stream after each command. I did not expect that, and would have expected I could issue as many commands as I wanted and then close the connection myself when I was through.

Next I changed the test to start a reverse server, then while sitting with the reverse server stream open I have to spin up another IPC call and do the profiler attach. But then the runtime wouldn't actually resume after sending the resume runtime command, either from the original reverse server stream or the new command stream. After a little debugging I found out that it's because of this check which will not resume the runtime if there are any idle connections around:

runtime/src/native/eventpipe/ds-server.c

Lines 272 to 277 in d2fba8f

    
           ds_server_resume_runtime_startup (void) 
        
           { 
        
           	ds_ipc_stream_factory_resume_current_port (); 
        
           	if (!ds_ipc_stream_factory_any_suspended_ports () && ep_rt_wait_event_is_valid (&_server_resume_runtime_startup_event)) 
        
           		ep_rt_wait_event_set (&_server_resume_runtime_startup_event); 
        
           }

In order to resume the runtime I have to issue the resume runtime command via both streams.

It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.

CC @shirhatti

ghost · 2021-05-02T19:28:09Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds the ability for a profiler to register itself as a startup profiler over the diagnostics pipe. The only place where this is legal is in the "reverse server" startup mode, otherwise profilers will have to use the existing attach mechanism.

It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.

@josalem @noahfalk while testing this I ran in to some issues that were fairly unintuitive (or maybe even bugs). When first writing the test I expected to set up a reverse server, have the runtime connect to it and then be able to issue a startup profiler command and then a resume runtime command on the same session, but the runtime forcibly disconnects the stream after each command. I did not expect that, and would have expected I could issue as many commands as I wanted and then close the connection myself when I was through.

Next I changed the test to start a reverse server, then while sitting with the reverse server stream open I have to spin up another IPC call and do the profiler attach. But then the runtime wouldn't actually resume after sending the resume runtime command, either from the original reverse server stream or the new command stream. After a little debugging I found out that it's because of this check which will not resume the runtime if there are any idle connections around:

runtime/src/native/eventpipe/ds-server.c

Lines 272 to 277 in d2fba8f

    
           ds_server_resume_runtime_startup (void) 
        
           { 
        
           	ds_ipc_stream_factory_resume_current_port (); 
        
           	if (!ds_ipc_stream_factory_any_suspended_ports () && ep_rt_wait_event_is_valid (&_server_resume_runtime_startup_event)) 
        
           		ep_rt_wait_event_set (&_server_resume_runtime_startup_event); 
        
           }

In order to resume the runtime I have to issue the resume runtime command via both streams.

It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.

CC @shirhatti

Author:	davmason
Assignees:	davmason
Labels:	`area-Diagnostics-coreclr`
Milestone:	6.0.0

shirhatti · 2021-05-03T18:16:59Z

@jander-msft FYI

josalem · 2021-05-03T18:43:52Z

expected I could issue as many commands as I wanted and then close the connection myself when I was through

The IPC protocol is designed for every command to be a new connection. This is because any of those commands could have an Optional Continuation that would consume the connection. As such, you can't reuse a connection stream once you've exchanged a command to completion. If you are connecting to a Listen mode DiagnosticPort (tool->runtime), then you need to open a new connection to send another command. If you are listening for a Connect mode DiagnosticPort (runtime->tool), then you need to wait for the runtime to connect back to the tool to issue the next command (this should happen as soon as you issue the first command).

As for the resume command requiring every DiagnosticPort that is configured for suspension: that's by design. The expectation is that if user A configures a diagnostic port to suspend and user B configures another DiagnosticPort for suspension, then each user presumably had some command they wanted issued before the runtime resumes. Therefore, we don't resume until all DiagnosticPorts that are configured to suspend have had a resume command sent.

josalem · 2021-05-03T18:59:26Z

src/tests/profiler/common/DiagnosticsIPCWorkaround.cs

+// This is to work around having to wait for an update to the DiagnosticsClient nuget before adding
+// a test. I really hope this isn't permanent
+// TODO: remove all this code when DiagnosticsClient version is updated...


I've been maintaining a test harness version of the IPC client code that is simpler than the library and allows you to inspect and assert on partial responses: https://github.com/dotnet/runtime/blob/main/src/tests/tracing/eventpipe/common/IpcUtils.cs.

Example use in situ:

runtime/src/tests/tracing/eventpipe/diagnosticport/diagnosticport.cs

Lines 283 to 316 in 16db832

public static async Task<bool> TEST_AdvertiseAndProcessInfoCookiesMatch()

{

bool fSuccess = true;

string serverName = ReverseServer.MakeServerAddress();

Logger.logger.Log($"Server name is '{serverName}'");

var server = new ReverseServer(serverName);

using var memoryStream = new MemoryStream();

Task<bool> subprocessTask = Utils.RunSubprocess(

currentAssembly: Assembly.GetExecutingAssembly(),

environment: new Dictionary<string,string> { { Utils.DiagnosticPortsEnvKey, $"{serverName},nosuspend" } },

duringExecution: async (pid) =>

{

Stream stream = await server.AcceptAsync();

IpcAdvertise advertise = IpcAdvertise.Parse(stream);

Logger.logger.Log(advertise.ToString());

Logger.logger.Log($"Send ProcessInfo Diagnostics IPC Command");

// send ProcessInfo command (0x04=ProcessCommandSet, 0x00=ProcessInfo commandid)

var message = new IpcMessage(0x04,0x00);

Logger.logger.Log($"Sent: {message.ToString()}");

IpcMessage response = IpcClient.SendMessage(stream, message);

Logger.logger.Log($"received: {response.ToString()}");

ProcessInfo info = ProcessInfo.TryParse(response.Payload);

Logger.logger.Log($"ProcessInfo: {{ id={info.ProcessId}, cookie={info.RuntimeCookie}, cmdline={info.Commandline}, OS={info.OS}, arch={info.Arch} }}");

Utils.Assert(info.RuntimeCookie.Equals(advertise.RuntimeInstanceCookie), $"The runtime cookie reported by ProcessInfo and Advertise must match. ProcessInfo: {info.RuntimeCookie.ToString()}, Advertise: {advertise.RuntimeInstanceCookie.ToString()}");

Logger.logger.Log($"ProcessInfo and Advertise Cookies are equal");

}

);

fSuccess &= await subprocessTask;

return fSuccess;

}

I'm not sure we want to tie runtime testing to the client library. We would run into issues like this where new commands don't exist in the library and won't exist until the version is updated. Up till now, I've been using the simplified harness to test commands, and only use the client library for event streams. I'm open to changing that arrangement, though.

Yeah, I'm actually using the ReverseServer code from that test harness in my test. I would prefer to test our shipping code if possible, but the nuget depedency adds a real wrinkle. Maybe I should just accept that it's too burdensome to wait for a nuget update and use the mock IPC tools for everything.

src/tests/profiler/common/ProfilerTestRunner.cs

davmason · 2021-05-03T23:59:00Z

It looks like I introduced a double free that makes other eventing tests fail, I will look at it tonight or tomorrow.

jander-msft · 2021-05-04T00:23:32Z

It doesn't change anything else about profiler startup. When you issue the startup profiler command it stashes the path and guid, then uses them if a profiler isn't registered via the existing startup configuration.

If this new way to enabling a profiler on the process only allows for specifying the profiler path and guid, then I feel like its a bit limiting regarding adding additional profiler configuration:

For traditional profiling at startup, the profilers that I have worked typically use environment variables for additional configuration. The profiler understands that it is starting at process startup and thus searches for environment variables for the additional configuration.
For traditional profiler attach, the profilers that I have worked on typically get additional configuration information via ICorProfilerCallback3:InitializeForAttach. The profile has some way of deserializing that information and the client has to serialize the information into the format the profiler understands.

I understand that specifying the profiler while the runtime is suspended due the diagnostic port configuration is intended to act as profiling at startup (which is great, because that allows for the full range of profiling capabilities rather than what's only available at attach). However, I think there needs to be a way of providing the additional configuration information given that the process is already running (and can be running in a completely different process namespace as that of the tool that is hosting the diagnostic port listener).

I don't know what the correct approach to this is, but either having "client data" notion from ICorProfilerCallback3:InitializeForAttach would work (although, I don't think that would really fit into the existing callback interfaces correctly without invoking the InitializeForAttach method even though its not attach) or maybe adding another command the diagnostic pipe that allows for setting environment variables from outside the process.

An alternative approach could be to rewrite the profilers to load information from file system, but that requires (1) modifying profilers for this new attach-at-startup hybrid and (2) the tool hosting the diagnostic port listener might not have file system access in a way that allows clean separation of the path at which diagnostic ports are bound vs paths for profiler configuration.

josalem · 2021-05-04T00:33:15Z

src/native/eventpipe/ds-profiler-protocol.c

+        ds_ipc_stream_free (stream);
+        ep_raise_error ();


I'm not sure if this is the double free you are hitting, but this will lead to a double free.

ep_raise_error () => goto ep_on_error which ends with ep_exit_error_handler () => goto ep_on_exit which also calls ds_ipc_stream_free (stream).

I think this was it, the failing test was using the profiler attach command in reverse startup mode and verifying it failed, which would have hit this code path.

davmason · 2021-05-04T06:58:08Z

If this new way to enabling a profiler on the process only allows for specifying the profiler path and guid, then I feel like its a bit limiting regarding adding additional profiler configuration:

Thanks for that feedback, I hadn't really thought about the configuration aspect.

I don't know what the correct approach to this is, but either having "client data" notion from ICorProfilerCallback3:InitializeForAttach would work (although, I don't think that would really fit into the existing callback interfaces correctly without invoking the InitializeForAttach method even though its not attach) or maybe adding another command the diagnostic pipe that allows for setting environment variables from outside the process.

An alternative approach could be to rewrite the profilers to load information from file system, but that requires (1) modifying profilers for this new attach-at-startup hybrid and (2) the tool hosting the diagnostic port listener might not have file system access in a way that allows clean separation of the path at which diagnostic ports are bound vs paths for profiler configuration.

I also agree that the filesystem is probably not the way to go, it is more complicated than it needs to be and has access issues in certain environments. There are two approaches that seem like a good approach to me:

Add a Initialize2 method that passes client data like InitializeForAttach, and add a client data param to the startup profiler command. If the profiler doesn't implement the client data callback interface we just don't pass them the data.
(Same idea as yours) add an IPC command that adds environment variable(s) to the process

I like item 2 better. We could add a command on the "process" command set. I would lean towards having the command set a single environment variable and then you can call it as many times as you need to. Unless you need to set hundreds of environment variables I don't think the perf of multiple calls will matter, and I want to avoid having to pass strings like "env1=value1;env2=value2" because then you have to deal with quoting any of special characters.

davmason · 2021-05-04T07:02:14Z

The one thing worth calling out with option 2 is that the runtime caches environment variables at startup on non-Windows platforms due to thread safety issues, so if we went with option 2 then you wouldn't be able to retrieve values with getenv(), you'd have to use the ICorProfilerInfo*::GetEnvironmentVariable to get the values.

josalem · 2021-05-04T15:42:00Z

An alternative to modifying the environment might be to stash configuration with the IPC command payload that registers the startup profiler. There'd need to be some extra bookkeeping to make sure the correct config was routed to the profiler when it loads, though. When the profiler loads, you just pass in the attached config to the ICorProfilerCallback3:InitializeForAttach method.

davmason · 2021-05-04T17:57:26Z

I wouldn't want to confuse attach and startup by reusing the InitializeForAttach callback. If we go the "pass a configuration like attach" route, then we should introduce a new callback that is Initialize2 (or InitialzieForStartup) and passes in the configuration like attach.

My vote is still the environment variable command. @jander-msft can you let me know if my proposal works for your needs before I spend time implementing it?

noahfalk · 2021-05-06T03:20:27Z

It seems like a better user experience if we're going to offer all sorts of functionality over this IPC mechanism to let anyone who opens a stream issue multiple commands. And if we're stuck having to open multiple streams, it would be great if I didn't have to track them and issue resume runtime for every one of them.

@josalem already mentioned that the behavior is by design, but I do think its worth addressing from a standpoint that if a given design leads to a bad experience then it calls into question whether the design itself is good. I think you were using protocol at a lower layer of abstraction and other users aren't going to see the same behavior from their POV but I want to make sure everyone agrees and we aren't just sweeping it under the rug. To address the two specific concerns:

Needing to create a new connection per-command was counter-intuitive. For users using the DiagnosticClient API I expect they will be unaware that these reconnections are necessary because the client implementation is doing all the work for them. They would write some code like this (handwaving a bit) and it would 'just work':

var listener = new DiagnosticPortListener(portName);
server.StartListening( runtimeEndpoint =>
{
   DiagnosticClient client = new DiagnosticClient(runtimeEndpoint);
   client.SetEnvVar(...); // behind the scenes these calls are re-establishing a new connection each time
   client.SetEnvVar(...);
   client.StartupProfiler(...);
   client.ResumeRuntime();
});

Needing to call resume once for each port is onerous - I'd expect the standard use-case is that users only create one port so they won't see this challenge. I suspect you created multiple ports because it reduced the complexity of the re-establishing connections but it should be unnecessary (we should confirm that is). The example above shows a bunch of sequential commands on the same connection but it should also be possible to have multiple long-running commands in parallel too. The runtime should automatically initiate a new connection as soon as it receives the command from the tool without waiting for it to complete.

jander-msft · 2021-05-06T04:15:40Z

My vote is still the environment variable command. @jander-msft can you let me know if my proposal works for your needs before I spend time implementing it?

I apologize for not getting back to you sooner. This proposal seems good to me; it should be roughly equivalent to setting up the profiler and its specific settings via environment variables so that it can load from startup. I've also notified the CLR Instrumentation Engine owners in case they wanted to chime in or ask questions.

noahfalk

Not sure if you were planning to add the environment stuff here or separately, but either way looks good to me : )

davmason · 2021-05-07T07:52:26Z

I was originally planning on doing the environment variable stuff on this PR, but it's probably better to create a new PR so I'll follow up with that one.

davmason · 2021-05-07T07:57:46Z

@noahfalk @josalem how likely do we think it is that people will implement the IPC protocol without using our DiagnosticClient library? We already know that the OpenTelemetry guys implemented their own, I suspect other profiler vendors might be inclined to write their own library given how we keep adding profiler stuff to it. It may be worth thinking about users of the lower level pipe too.

That being said, it works how it is and I don't think it's a terrible user experience. I would just like it better if I could open one connection. I don't think it's bad enough to try and make us change it when we've already shipped it.

josalem · 2021-05-07T21:18:56Z

how likely do we think it is that people will implement the IPC protocol without using our DiagnosticClient library?

Like you mentioned, we already have people individuals and teams implementing their own Diagnostics IPC clients and EventPipe parsers. That being said, I don't anticipate many people will build their own. At some point I would like to build a couple sample/reference implementations in some targeted other languages like Java, Go, and Python for instance. That would make it easier for external tools to interact with dotnet processes. For now, I think we are probably fine to bolster our documentation of the protocol and react to feedback as it comes in.

davmason added 10 commits May 1, 2021 02:23

Add profiler startup command

8618432

Add the logic in profiler loading

6cf28b4

Add test

2c51aac

Finish test

bf6a1cf

Add hacky way to send messages

90808b6

Working on tests

cb86062

more test work

dafa381

It works!

1a4b0c8

More test work

a5100fa

xplat fixes

a0e8a49

davmason added the area-Diagnostics-coreclr label May 2, 2021

davmason added this to the 6.0.0 milestone May 2, 2021

davmason requested review from lambdageek and a team May 2, 2021 19:28

davmason requested a review from marek-safar as a code owner May 2, 2021 19:28

davmason self-assigned this May 2, 2021

marek-safar requested a review from lateralusX May 3, 2021 07:12

josalem reviewed May 3, 2021

View reviewed changes

Update test to only use reverse server mode

6d25792

josalem reviewed May 4, 2021

View reviewed changes

Fix double free

9c5df6c

noahfalk approved these changes May 6, 2021

View reviewed changes

karpinsn approved these changes May 6, 2021

View reviewed changes

davmason merged commit f511a4a into dotnet:main May 7, 2021

This was referenced May 10, 2021

Add diagnostics IPC environment variable commands #52567

Merged

Add new DiagnosticClient commands for IPC features dotnet/diagnostics#2268

Merged

runfoapp bot mentioned this pull request May 20, 2021

readytorun/determinism/crossgen2determinism/crossgen2determinism.sh [FAIL] #50466

Closed

ghost locked as resolved and limited conversation to collaborators Jun 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profiler startup command to the diagnostics command set #52175

Add profiler startup command to the diagnostics command set #52175

davmason commented May 2, 2021

ghost commented May 2, 2021

shirhatti commented May 3, 2021

josalem commented May 3, 2021

josalem May 3, 2021

davmason May 3, 2021

davmason commented May 3, 2021

jander-msft commented May 4, 2021

josalem May 4, 2021

davmason May 4, 2021

davmason commented May 4, 2021

davmason commented May 4, 2021

josalem commented May 4, 2021

davmason commented May 4, 2021

noahfalk commented May 6, 2021

jander-msft commented May 6, 2021

noahfalk left a comment

davmason commented May 7, 2021

davmason commented May 7, 2021

josalem commented May 7, 2021

	ds_server_resume_runtime_startup (void)
	{
	ds_ipc_stream_factory_resume_current_port ();
	if (!ds_ipc_stream_factory_any_suspended_ports () && ep_rt_wait_event_is_valid (&_server_resume_runtime_startup_event))
	ep_rt_wait_event_set (&_server_resume_runtime_startup_event);
	}

	public static async Task<bool> TEST_AdvertiseAndProcessInfoCookiesMatch()
	{
	bool fSuccess = true;
	string serverName = ReverseServer.MakeServerAddress();
	Logger.logger.Log($"Server name is '{serverName}'");
	var server = new ReverseServer(serverName);
	using var memoryStream = new MemoryStream();
	Task<bool> subprocessTask = Utils.RunSubprocess(
	currentAssembly: Assembly.GetExecutingAssembly(),
	environment: new Dictionary<string,string> { { Utils.DiagnosticPortsEnvKey, $"{serverName},nosuspend" } },
	duringExecution: async (pid) =>
	{
	Stream stream = await server.AcceptAsync();
	IpcAdvertise advertise = IpcAdvertise.Parse(stream);
	Logger.logger.Log(advertise.ToString());

	Logger.logger.Log($"Send ProcessInfo Diagnostics IPC Command");
	// send ProcessInfo command (0x04=ProcessCommandSet, 0x00=ProcessInfo commandid)
	var message = new IpcMessage(0x04,0x00);
	Logger.logger.Log($"Sent: {message.ToString()}");
	IpcMessage response = IpcClient.SendMessage(stream, message);
	Logger.logger.Log($"received: {response.ToString()}");
	ProcessInfo info = ProcessInfo.TryParse(response.Payload);
	Logger.logger.Log($"ProcessInfo: {{ id={info.ProcessId}, cookie={info.RuntimeCookie}, cmdline={info.Commandline}, OS={info.OS}, arch={info.Arch} }}");

	Utils.Assert(info.RuntimeCookie.Equals(advertise.RuntimeInstanceCookie), $"The runtime cookie reported by ProcessInfo and Advertise must match. ProcessInfo: {info.RuntimeCookie.ToString()}, Advertise: {advertise.RuntimeInstanceCookie.ToString()}");
	Logger.logger.Log($"ProcessInfo and Advertise Cookies are equal");
	}
	);

	fSuccess &= await subprocessTask;

	return fSuccess;
	}

Add profiler startup command to the diagnostics command set #52175

Add profiler startup command to the diagnostics command set #52175

Conversation

davmason commented May 2, 2021

ghost commented May 2, 2021

shirhatti commented May 3, 2021

josalem commented May 3, 2021

josalem May 3, 2021

Choose a reason for hiding this comment

davmason May 3, 2021

Choose a reason for hiding this comment

davmason commented May 3, 2021

jander-msft commented May 4, 2021

josalem May 4, 2021

Choose a reason for hiding this comment

davmason May 4, 2021

Choose a reason for hiding this comment

davmason commented May 4, 2021

davmason commented May 4, 2021

josalem commented May 4, 2021

davmason commented May 4, 2021

noahfalk commented May 6, 2021

jander-msft commented May 6, 2021

noahfalk left a comment

Choose a reason for hiding this comment

davmason commented May 7, 2021

davmason commented May 7, 2021

josalem commented May 7, 2021