Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support profiling from inside the interpreter, or even from code #71

Closed
itamarst opened this issue Aug 21, 2020 · 9 comments · Fixed by #129
Closed

Support profiling from inside the interpreter, or even from code #71

itamarst opened this issue Aug 21, 2020 · 9 comments · Fixed by #129
Labels
documentation Improvements or additions to documentation enhancement New feature or request NEXT ux

Comments

@itamarst
Copy link
Collaborator

itamarst commented Aug 21, 2020

The same infrastructure that would allow profiling with Jupyter (#12) would also allow limited-scope profiling from inside a running Python interpreter (as opposed to the current "full interpreter run" option). The main benefits being:

  1. For interpreter, interactive profiling sessions.
  2. Added to code, being able to profile specific areas (and not have performance overhead elsewhere).

This is mostly documentation since the hard lifting will be done in #12.

Going to restrict this to "API for profiling particular bits of code", IPython CLI support can be different issue.

@itamarst itamarst added enhancement New feature or request ux documentation Improvements or additions to documentation labels Aug 21, 2020
@corleyma
Copy link

corleyma commented Nov 4, 2020

To what extent is calling filprofiler from code now supported, given the Jupyter magic work completed in #12?

@itamarst
Copy link
Collaborator Author

itamarst commented Nov 4, 2020

It works, it's just ... not documented. And probably need a bit more glue to make it easier? I'll try to do this this week, and then do a release, it's probably time for one.

@itamarst
Copy link
Collaborator Author

itamarst commented Nov 6, 2020

@corleyma what's your use case, BTW?

There are different ways I could approach this, from making the IPython DTRT to more general Python interpreter support.

@corleyma
Copy link

corleyma commented Nov 9, 2020

Though this is likely very specific to problems I am currently solving for my teams, here's a detailed statement for what I'm trying to achieve:

I am interested in integrating filprofiler with our Python workflow orchestration layer such that engineers and scientists can specify dynamically that they'd like to profile a given workflow step. In that case, the workflow orchestration layer would invoke user code with profiling, saving the profiling outputs as workflow artifacts for later consumption.

Another, related use case of interest: updating our workflow orchestration layer to re-try steps that fail with OOM with profiling enabled, saving the profiling outputs created by filprofiler as workflow artifacts for later consumption.

Translating to this into an ask for filprofiler:

  • the ability to use filprofiler from a "normal" Python interpreter, where normal could mean:
    • a Python interpreter not originally started by the filprofiler launcher (ideal)
    • or at least, a Python interpreter started by the filprofiler launcher that nonetheless does not affect code that is not explicitly profiled (less ideal but workable)
  • the ability to profile a specific call (or block of code?) and manipulate the resultant artifacts
    • one possible example: fil_profile(artifacts_callback, fn, fn_arguments)
    • another example as a context manager:
      with filprofiler as profiler:  # profiler is a reference that can be used to access profiler state/profiling results
        peak_memory_bytes, flamegraph_svg = profiler.profile(do_something_expensive())
        print(peak_memory_bytes)
        open('results.svg', 'wb').write(flamegraph_svg)

Take the above suggestions with a grain of salt, because I am less familiar with the constraints of filprofiler and what makes for a reasonable API.

@itamarst
Copy link
Collaborator Author

itamarst commented Nov 9, 2020

As far as those requirements:

  1. Using a regular Python interpreter is theoretically possible, maybe, on Linux (via BPF), but that's a research project I haven't had the time for yet.
  2. Currently you can have Python interpreter that doesn't affect things much until actual profiling happens; next release will reduce the impact even further. That being said, I haven't extensively tested this mode so conceivably there could be bugs, but in theory it should just add a tiny amount of overhead (1% performance hit or something).
  3. An API to get artifacts is a possibility, yeah.
  4. In theory Fil can do reporting on OOM. The way it is does it now is... probably broken, but I suspect there are ways to make it more likely to succeed, especially in controlled environments (i.e. single task machines/VMs/containers). They range from fairly simple to much more significant architectural changes.

One thing I've been thinking about is how to make Fil development sustainable; there are projects like the normal Python interpreter, multiprocessing, etc. that are pretty big, and this is development I'm doing on my own time.

Since it sounds like Fil would be helping you quite a bit, would you be interested in funding some of the work? For example, I could imagine a supporter contract structure where organizations who buy it get their feature requests prioritized. So effectively not different than buying proprietary off-the-shelf software or services, just what you're getting is slightly different, just so it's easiest for everyone. If that's something you'd be interested in, I'd love to hear what would the easiest thing for you to do within your corporate structure, e.g. if it's easier to do monthly payments vs. one-off large payment.

@corleyma
Copy link

Currently you can have Python interpreter that doesn't affect things much until actual profiling happens; next release will reduce the impact even further. That being said, I haven't extensively tested this mode so conceivably there could be bugs, but in theory it should just add a tiny amount of overhead (1% performance hit or something).

Being able to call filprofiler from a modified interpreter that doesn't add too much overhead until profiling happens is totally sufficient for our use case. It seems like what's missing for us at this stage then is just some documentation around how to use filprofiler from code (with a suitably altered interpreter), and API access to the profiling artifacts.

In theory Fil can do reporting on OOM. The way it is does it now is... probably broken, but I suspect there are ways to make it more likely to succeed, especially in controlled environments (i.e. single task machines/VMs/containers). They range from fairly simple to much more significant architectural changes.
Re: reporting on OOM, I am curious to know more about what you suspect are the limitations of the current implementation. I envisioned this feature as a best-effort attempt to provide additional telemetry for later debugging, so robustness isn't a hard requirement, though if you think it's fundamentally a flawed approach I'd probably steer clear for now.

I think the idea of supporting open source work is very interesting but I can't speak for my company. As a matter of personal interest, however, I would be willing to collaborate on feature development for the right features.

@itamarst
Copy link
Collaborator Author

  1. I opened The current out-of-memory support is unreliable #96 for the OOM issue. My guess is the rusage approach would work for you (and that it doesn't have to be done within Fil itself, though that would be even nicer).
  2. Would be nice to have some help working on this, certainly.
  3. Not asking you to promise your company will spend money, more like, what are things you feel you can just put on company credit card and not worry about it.

@itamarst
Copy link
Collaborator Author

itamarst commented Dec 9, 2020

Status report:

  1. As preliminary to making automated reporting useful, I'm working on making OOM reporting reliable, because it's not automated if you have to do a bunch of debugging ("is this OOM? a bug? something else?")+input data massaging ("how do I make this use less memory... without a profiler?!") to get profiling results.
  2. While thinking about that, discovered generating the SVG used a huge amount of memory, which is bad (you're already low on memory when you hit OOM! you might be low on memory when program is finishing up, etc.), so fixed that.
  3. Now have design for reliable OOM handling, I think, so going to work on that next.

@itamarst
Copy link
Collaborator Author

Another status report:

  • feature request - support decorator #111 rather makes me think that API for profiling should be part of open source project, there is a reason memory_profiler has that.
  • I'm finally done with better out-of-memory detection for profiler, so will probably work on API next.
  • The OOM detection does not work when in non-profiling mode, it had both performance impact and doesn't match code design. I expect to create low overhead OOM detection w/retries for production separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request NEXT ux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants