-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: prepare and pool processes #87
Conversation
🦙 MegaLinter status: ✅ SUCCESS
See detailed report in MegaLinter reports |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #87 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 733 750 +17
=========================================
+ Hits 733 750 +17 ☔ View full report in Codecov by Sentry. |
We probably don't need that many: Most of the time, the processes will idle. As long as the runner is local, 2 workers should be enough. I don't think the user would fire off pipeline runs without waiting for completion of the previous one. |
I see, that's a lot of memory wasted. But two processes are easily saturated, if two tables are opened in the EDA view (waiting for stats). A normal execution would only be queued and needs to wait for any running pipelines to complete first. |
Also, coverage seems broken again. It doesn't seem to like multiprocessing pools very much. |
Sure, seems good to me. Can easily be tweaked later.
But then the original worker is already free again, no? |
I would not consider this a blocker for this PR. I'll investigate whether that can be fixed later. Edit: Maybe this already fixes the issue. |
test: verify that memoized types are correctly sent misc: change max amount of pipeline processes to 4
When viewing one table, then this is true. But everything would break, as soon as two tables are waiting for stats. This might be too cautious, but since gathering the stats for the EDA view takes a bit of time, this could be irritating. |
Break in what way? |
Maybe break is not the right description. |
I see, that's an issue for another day (and someone else 😉). |
This PR should lead to a modest improvement in startup time, and a small improvement (mostly depending on the content) in runtime. I don't know what the target is, and how much potential for startup-optimization is left, but I was able to get small and medium tables after a few 100ms, after pressing the "Explore" lens. |
It already feels a lot better. The last idea I'd have is to add an initializer to the process pool doing something like
Then all processes can immediately display a |
There seems to be an issue with the memoization of block & expression lambdas now:
If you change the |
I can't reproduce this right now (but I'm seeing another somewhat related error, |
I've started looking into this a little: It seems like
In the program message, the lambda code is def __gen_lambda_0(param1):
return (param1) < (2) but in the key it's def __gen_lambda_0(param1):
return (param1) < (4) I've also had it throw an exception after adding more lines to a pipeline that contain lambdas:
|
16ecbf7
to
505b5fb
Compare
…nd up to a maximum
505b5fb
to
f5afd39
Compare
Looks like this Python bug is back. Got it working now 🎉. |
634209d
to
0c3f285
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execution time feels great now, awesome stuff!
Thanks for investigating further, and great that it is now working 🎉 |
🎉 This PR is included in version 0.12.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Closes #85
Summary of Changes
4
.metapath
is reset to remove the custom generated Safe-DS pipeline code, only global library imports (and settings) should remain. If this is a concern,maxtasksperchild
can be set to1
, in which case pipeline processes are not reused.