-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add worker config check_unfulfilled_deps. #2189
Conversation
@riga, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ChrisBeaumont, @steenzout and @daveFNbuck to be potential reviewers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to approve. Just curious, when do you need this? Is your complete checks really slow?
luigi/worker.py
Outdated
@@ -910,7 +914,8 @@ def _create_task_process(self, task): | |||
return TaskProcess( | |||
task, self._id, self._task_result_queue, reporter, | |||
use_multiprocessing=bool(self.worker_processes > 1), | |||
worker_timeout=self._config.timeout | |||
worker_timeout=self._config.timeout, | |||
check_unfulfilled_deps=self._config.check_unfulfilled_deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a trailing comma here? Next line addition will look nicer in the diffs. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
Kind of. Depending on the current load at some computing and storage facilities, the completeness checks can be slow. But usually, the number of requests is the bottle neck. A typical use case for this feature is a tree of M tasks which all have the same N dependencies. When the completeness check of those N deps is load-intensive (say 1k network requests per dep), this N x 1k requests are executed once for tree building and repeated M times which we'd like to skip by disabling the check. Of course, we have to make sure that the targets of the deps don't change during the processing of the tree. btw: Do you know what's wrong with travis? |
I think this patch looks reasonable. Two thoughts:
I think they changed Travis, and we having magic caused failures. |
Yes, that makes sense. The more docs the better ;) I'll add one or two sentences.
In our use case, we would use the worker config, but other users might prefer a per-task setting. What about using both? Of course, the task setting would have precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. This all looks good to me. There's no need to worry about the per-task use case.
However, I do kindly request to add at least one test case before we merge this. :)
By the way, since I merged #2190, maybe the build issues are resolved. :) |
@Tarrasch even after manually restarting Travis, build continues to exit |
@dlstadther, that's expected if you didn't rebase upon master. As the Travis reads the config from the checked in |
Done :) There's still one failing travis job, but this is unrelated to my code, right? |
Thanks @riga :) |
Description
When a worker is about to run a task, it checks whether its dependencies are complete. This PR adds a configuration parameter which makes that behavior optional.
Motivation and Context
In our workflow, we use remote targets that are stored in various locations on the WLCG. Some of our tasks have O(1000) dependencies with O(10) output targets each. When running these tasks, the dependency tree is built by checking if the targets exist, which results in network load. But before a worker actually runs a task, it checks its dependencies again. By skipping this, we could significantly reduce the network load. Therefore, I added a
[worker]
config parametercheck_unfulfilled_deps
to alter this behavior.