Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect pystats in subprocesses #263

Open
mdboom opened this issue Sep 5, 2024 · 4 comments
Open

Collect pystats in subprocesses #263

mdboom opened this issue Sep 5, 2024 · 4 comments
Assignees

Comments

@mdboom
Copy link
Contributor

mdboom commented Sep 5, 2024

There are two ways in which "real work" that we'd want to measure ends up in a subprocess. Neither of these cases presently collects pystats in the subprocess.

  • Using the pyperf.Runner.bench_command API (eg. python_startup). These should be handled automatically by this update to pyperf, we just need a new pyperf release and then update pyperformance and bench_runner to use it.

  • The benchmark itself fires off a subprocess to run something like a webserver (eg. djangocms). These benchmarks will need to be individually updated to enable pystats in the subprocess, under the right circumstances. It should basically amount to checking for --hook pystats on the commandline and then setting the envvar PYTHONSTATS=1 in the subprocess. I can't think of a way to do that automatically for all of these, but there aren't that many, we should just update them individually. (This also requires the pyperf update above, first).

Cc: @brandtbucher

@mdboom mdboom self-assigned this Sep 5, 2024
@brandtbucher
Copy link
Member

The tricky thing for case two is that we don't really have a way of knowing if we've "caught" all of the Python subprocesses (especially the ones that use library code). We just need to either notice that the benchmark stats seem off, or know exactly what the benchmark is doing.

I think the current default for stats builds is to not collect stats by default, and require an environment variable to enable them at startup.

Could we flip that? So collect stats by default, but turn them off with an env var? That way all new subprocesses will collect stats by default, without us needing to do anything. And pyperf/pyperformance can explicitly disable stats collection at startup for its subprocesses and only turn them on around the invocation of the benchmark itself.

The one catch is that the main pyperformance invocation needs to be called with the env var set (to avoid collecting stats from pyperformance itself). But that seems straightforward enough to deal with - our benchmarking infrastructure can be updated to set everything correctly, and we could update pyperformance/pyperf to exit with a helpful error message if it's called on a stats build without the env var.

@mdboom
Copy link
Contributor Author

mdboom commented Sep 6, 2024

we could update pyperformance/pyperf to exit with a helpful error message if it's called on a stats build without the env var

I'm a little worried this isn't great UX, but it's a balance, right?

Just to throw an alternative out there, importing pyperf could monkeypatch subprocess.Popen and inject the environment variable turn stats on (if --hook pystats is passed in). That's maybe a bit too magical.

@brandtbucher
Copy link
Member

We’re probably some of the only people who actually use the stats builds, so I doubt many people would be frustrated by the error message (or be surprised by the subprocess magic).

I still prefer the error message, I think. Not only a bit simpler and less error-prone, but I can’t think of a way to do the subprocess solution without “leaking” an additional stats file for the parent process.

I actually just thought of a third option... could the initial process call sys._stats_off(); sys._stats_clear()? As long as it’s not reenabled at some point, that should just write an empty file at process exit, I think.

@mdboom
Copy link
Contributor Author

mdboom commented Sep 6, 2024

I actually just thought of a third option... could the initial process call sys._stats_off(); sys._stats_clear()? As long as it’s not reenabled at some point, that should just write an empty file at process exit, I think.

Yeah, that's a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants