Collect pystats in subprocesses #263

mdboom · 2024-09-05T15:43:29Z

There are two ways in which "real work" that we'd want to measure ends up in a subprocess. Neither of these cases presently collects pystats in the subprocess.

Using the pyperf.Runner.bench_command API (eg. python_startup). These should be handled automatically by this update to pyperf, we just need a new pyperf release and then update pyperformance and bench_runner to use it.
The benchmark itself fires off a subprocess to run something like a webserver (eg. djangocms). These benchmarks will need to be individually updated to enable pystats in the subprocess, under the right circumstances. It should basically amount to checking for --hook pystats on the commandline and then setting the envvar PYTHONSTATS=1 in the subprocess. I can't think of a way to do that automatically for all of these, but there aren't that many, we should just update them individually. (This also requires the pyperf update above, first).

Cc: @brandtbucher

The text was updated successfully, but these errors were encountered:

brandtbucher · 2024-09-05T18:19:56Z

The tricky thing for case two is that we don't really have a way of knowing if we've "caught" all of the Python subprocesses (especially the ones that use library code). We just need to either notice that the benchmark stats seem off, or know exactly what the benchmark is doing.

I think the current default for stats builds is to not collect stats by default, and require an environment variable to enable them at startup.

Could we flip that? So collect stats by default, but turn them off with an env var? That way all new subprocesses will collect stats by default, without us needing to do anything. And pyperf/pyperformance can explicitly disable stats collection at startup for its subprocesses and only turn them on around the invocation of the benchmark itself.

The one catch is that the main pyperformance invocation needs to be called with the env var set (to avoid collecting stats from pyperformance itself). But that seems straightforward enough to deal with - our benchmarking infrastructure can be updated to set everything correctly, and we could update pyperformance/pyperf to exit with a helpful error message if it's called on a stats build without the env var.

mdboom · 2024-09-06T13:19:18Z

we could update pyperformance/pyperf to exit with a helpful error message if it's called on a stats build without the env var

I'm a little worried this isn't great UX, but it's a balance, right?

Just to throw an alternative out there, importing pyperf could monkeypatch subprocess.Popen and inject the environment variable turn stats on (if --hook pystats is passed in). That's maybe a bit too magical.

brandtbucher · 2024-09-06T13:48:52Z

We’re probably some of the only people who actually use the stats builds, so I doubt many people would be frustrated by the error message (or be surprised by the subprocess magic).

I still prefer the error message, I think. Not only a bit simpler and less error-prone, but I can’t think of a way to do the subprocess solution without “leaking” an additional stats file for the parent process.

I actually just thought of a third option... could the initial process call sys._stats_off(); sys._stats_clear()? As long as it’s not reenabled at some point, that should just write an empty file at process exit, I think.

mdboom · 2024-09-06T14:04:36Z

I actually just thought of a third option... could the initial process call sys._stats_off(); sys._stats_clear()? As long as it’s not reenabled at some point, that should just write an empty file at process exit, I think.

Yeah, that's a good solution.

mdboom self-assigned this Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect pystats in subprocesses #263

Collect pystats in subprocesses #263

mdboom commented Sep 5, 2024

brandtbucher commented Sep 5, 2024

mdboom commented Sep 6, 2024

brandtbucher commented Sep 6, 2024

mdboom commented Sep 6, 2024

Collect pystats in subprocesses #263

Collect pystats in subprocesses #263

Comments

mdboom commented Sep 5, 2024

brandtbucher commented Sep 5, 2024

mdboom commented Sep 6, 2024

brandtbucher commented Sep 6, 2024

mdboom commented Sep 6, 2024