-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NameError when running Apache Beam pipelines under Fil #202
Comments
Thanks for the bug report, that's sounds annoying. And strange! Fil doesn't do anything with globals. I'll try to reproduce and see what I can figure out. |
I was able to reproduce, after tweaking above example to do |
Looking through the code again, there is some ways in which Fil interacts with globals. It seems like Apache Beam uses some form of pickling (via |
OK, I think I have a fix. Will work on PR next. |
I will do a release in the near future. |
Huge thank you, @itamarst! Absolutely amazing turnaround time on this - I'll give this a try shortly and see if it helps debug the OOMs that my team has seen in our pipelines. |
Note that you'll need to use the DirectRunner (perhaps in multi-threaded mode) since multiple processes are not yet supported. |
Release still isn't out yet, still finishing up another branch and that won't happen today. Will update when it's available. Couple other notes:
|
Release 2021.7.0 is now out with this change; it should be up on PyPI within a few minutes, and on Conda-Forge by tomorrow, hopefully. |
Thanks @itamarst! I've had a chance to test release 2021.7.0 and can confirm that it now works with |
Oops. At this point I'm using Python's |
What version of Python are you using, BTW? |
I'm seeing the above issues on Python 3.8.9, but can try to reproduce with other versions if that'd be helpful. |
And re: a workaround - absolutely! Even with this tiny issue, |
Excellent! Can I quote that on the project site, with your name and ideally the name of the organization you work for? |
My impression was that at least on Python 3.9 Depending on whether it's fixed in 3.9, I can imagine different approaches:
|
Please do, although the organization I work for has a PR department, so please just use my name for now. 🙂 |
Thank you! I was forced to yank the latest release due to #208 (the wheels are broken on some OSes in a way that wasn't caught by CI) so if you do new installs you will get older version for now until I get more tests and a bugfix out. If that's a problem let me know and I will post the wheels here for manual download. |
Another potential fix: if 3.9 or 3.10 |
Release 2021.7.1 should be up on PyPI shortly. |
The fix will be included in the next release, will post here when it's available. |
This is now available as part of release 2021.8.0. I also, by the way, now have a first pass of profiler that can run on production jobs, if you'd be interested in playing with it. |
Hi @itamarst! Fil looks like a great project, and I've been eagerly giving it a try - but have run across a small issue when trying to use it to debug memory issues in Apache Beam pipelines.
Here's a minimal reproducible example that breaks when running under
fil-profiler run
, but otherwise works just fine:The traceback printed shows that the code is being invoked directly through Fil all the way from
main()
(i.e.: no subprocesses involved) but at runtime, the globals in scope of the function can't be resolved:This does seem like Fil and Apache Beam might both be doing something with globals that causes them not to play nicely together.
The text was updated successfully, but these errors were encountered: