Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameError when running Apache Beam pipelines under Fil #202

Closed
psobot opened this issue Jul 8, 2021 · 22 comments · Fixed by #205 or #216
Closed

NameError when running Apache Beam pipelines under Fil #202

psobot opened this issue Jul 8, 2021 · 22 comments · Fixed by #205 or #216
Labels
bug Something isn't working NEXT

Comments

@psobot
Copy link

psobot commented Jul 8, 2021

Hi @itamarst! Fil looks like a great project, and I've been eagerly giving it a try - but have run across a small issue when trying to use it to debug memory issues in Apache Beam pipelines.

Version information

Fil: 2021.5.0
Python: 3.8.9 (default, Apr 3 2021, 01:50:09)
[Clang 12.0.0 (clang-1200.0.32.29)]

Here's a minimal reproducible example that breaks when running under fil-profiler run, but otherwise works just fine:

import math
import apache_beam as beam


class MyProcessor(beam.DoFn):
    def process(self, element):
        return math.exp(element)


def main():
    with beam.Pipeline() as pipeline:
        pipeline | beam.Create([1, 2, 3]) | beam.ParDo(MyProcessor())


if __name__ == "__main__":
    main()

The traceback printed shows that the code is being invoked directly through Fil all the way from main() (i.e.: no subprocesses involved) but at runtime, the globals in scope of the function can't be resolved:

< many frames omitted >
  File "apache_beam/runners/common.py", line 1315, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "/Users/psobot/Library/Python/3.8/lib/python/site-packages/future/utils/__init__.py", line 446, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "apache_beam/runners/common.py", line 1233, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 582, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "example.py", line 7, in process
    return math.exp(element)
NameError: name 'math' is not defined [while running 'ParDo(MyProcessor)']

This does seem like Fil and Apache Beam might both be doing something with globals that causes them not to play nicely together.

@itamarst
Copy link
Collaborator

Thanks for the bug report, that's sounds annoying. And strange! Fil doesn't do anything with globals.

I'll try to reproduce and see what I can figure out.

@itamarst itamarst added NEXT bug Something isn't working labels Jul 10, 2021
@itamarst
Copy link
Collaborator

I was able to reproduce, after tweaking above example to do yield instead of return in process().

@itamarst
Copy link
Collaborator

Looking through the code again, there is some ways in which Fil interacts with globals. It seems like Apache Beam uses some form of pickling (via dill?) of DoFn, which is presumably interacting badly with the way Fil does things.

@itamarst
Copy link
Collaborator

OK, I think I have a fix. Will work on PR next.

@itamarst
Copy link
Collaborator

I will do a release in the near future.

@psobot
Copy link
Author

psobot commented Jul 11, 2021

Huge thank you, @itamarst! Absolutely amazing turnaround time on this - I'll give this a try shortly and see if it helps debug the OOMs that my team has seen in our pipelines.

@itamarst
Copy link
Collaborator

Note that you'll need to use the DirectRunner (perhaps in multi-threaded mode) since multiple processes are not yet supported.

@itamarst
Copy link
Collaborator

Release still isn't out yet, still finishing up another branch and that won't happen today. Will update when it's available. Couple other notes:

  1. If it works, I'd love to hear about it; if it doesn't, I'd also love to hear about it so I can fix any other issues.
  2. I am also working on a production-grade memory profiler, so you can get memory profiling for all production jobs, with negligible performance impact. If you'd like to try it when I have the first release available, hopefully soon, let me know.

@itamarst
Copy link
Collaborator

Release 2021.7.0 is now out with this change; it should be up on PyPI within a few minutes, and on Conda-Forge by tomorrow, hopefully.

@psobot
Copy link
Author

psobot commented Jul 13, 2021

Thanks @itamarst! I've had a chance to test release 2021.7.0 and can confirm that it now works with DirectRunner if I run a Python file directly. I did encounter one small additional bug, though: I still encounter the same NameError if I launch my code with fil-profile run -m package_name.module_name, whereas fil-profile run package_name/module_name.py works just fine.

@itamarst itamarst reopened this Jul 13, 2021
@itamarst
Copy link
Collaborator

Oops. At this point I'm using Python's runpy module for both cases, and in theory it's supposed to match Python's normal behavior, but perhaps not in practice. I'll take a look, but it'll probably be a week or two before I have time to look at this (and it sounds like you have a workaround, so it shouldn't be a blocker).

@itamarst
Copy link
Collaborator

What version of Python are you using, BTW?

@psobot
Copy link
Author

psobot commented Jul 13, 2021

I'm seeing the above issues on Python 3.8.9, but can try to reproduce with other versions if that'd be helpful.

@psobot
Copy link
Author

psobot commented Jul 13, 2021

And re: a workaround - absolutely! Even with this tiny issue, fil-profiler has just pointed straight at the cause of a memory issue that's been costing my team tons of time and compute power. Thanks again for such an excellent tool!

@itamarst
Copy link
Collaborator

fil-profiler has just pointed straight at the cause of a memory issue that's been costing my team tons of time and compute power. Thanks again for such an excellent tool!

Excellent! Can I quote that on the project site, with your name and ideally the name of the organization you work for?

@itamarst
Copy link
Collaborator

I'm seeing the above issues on Python 3.8.9, but can try to reproduce with other versions if that'd be helpful.

My impression was that at least on Python 3.9 runpy is how python -m works, and even if not it presumably has some bug fixes, so it'd be good to test 3.9, but if you don't have time I will get to it eventually.

Depending on whether it's fixed in 3.9, I can imagine different approaches:

  1. Figure out what is happening differently, fix it.
  2. Document it as known problem (especially if it's fixed in newer Python).
  3. In the commercial production version I've figured out a mechanism for injecting profiling that doesn't involve runpy, so I could probably use that mechanism and then behavior would exactly match Python.

@psobot
Copy link
Author

psobot commented Jul 13, 2021

Excellent! Can I quote that on the project site, with your name and ideally the name of the organization you work for?

Please do, although the organization I work for has a PR department, so please just use my name for now. 🙂

@itamarst
Copy link
Collaborator

Thank you!

I was forced to yank the latest release due to #208 (the wheels are broken on some OSes in a way that wasn't caught by CI) so if you do new installs you will get older version for now until I get more tests and a bugfix out. If that's a problem let me know and I will post the wheels here for manual download.

@itamarst
Copy link
Collaborator

Another potential fix: if 3.9 or 3.10 runpy fixes the problem, should be easy to include backported version in Fil, it's a single file.

@itamarst
Copy link
Collaborator

Release 2021.7.1 should be up on PyPI shortly.

@itamarst
Copy link
Collaborator

The fix will be included in the next release, will post here when it's available.

@itamarst
Copy link
Collaborator

This is now available as part of release 2021.8.0.

I also, by the way, now have a first pass of profiler that can run on production jobs, if you'd be interested in playing with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NEXT
Projects
None yet
2 participants