-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow actor exceptions to propagate #4232
Conversation
This seems to take more code than I would have thought, to wrap and unwrap the results dict in various places. |
e2483ad
to
a3d0cff
Compare
distributed/worker.py
Outdated
@@ -785,6 +785,7 @@ async def get_metrics(self): | |||
executing={ | |||
key: now - self.tasks[key].start_time | |||
for key in self.active_threads.values() | |||
if key in self.tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a little worrysome. Without it, you get KeyError here during heartbeat after a failed actor action, but I don't really know why. Should the "active thread" be cleaned up somewhere?
CI fails on apparently unrelated timeout in test_broken_worker_during_computation on py36 only |
If this still passes, I would really like to see it merged! I may offer a "please object" timeframe. |
@jrbourbeau would you be able to take a look or do you know who would be able to? |
OK, all passed except some unrelated and presumably flaky steal test in one run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martindurant!
def prop(self): | ||
raise MyException | ||
|
||
with cluster(nworkers=2) as (cl, w): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use cluster
here instead of @gen_cluster
like most of the other test in this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to test the sync API, which is more typical for actors. I added an async version immediately below (could remove this one, if you like).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to test the sync API, which is more typical for actors. I added an async version immediately below (could remove this one, if you like).
To be clear here, the sync api is more common for everything. We prefer the async tests because they are faster/easier on CI and allow for greater debuggability. Adding sync tests too is fine if we want to be extra careful, but in general we prefer async tests. In general if async tests I usually have confidence that sync works just as well, unless I'm explicitly writing code to handle synchronization.
What's here is great though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's here is great though.
I don't mind keeping it or not. Writing the sync version first probably reflects how I initially tested by hand. It was some time ago, so I can't remember if there was any other reason, given that the async version is effectively identical and works just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @martindurant! Just a few small comments on test_actor.py
, otherwise I think this is good to merge
distributed/tests/test_actor.py
Outdated
from distributed.utils_test import ( # noqa: F401 | ||
async_wait_for, | ||
cluster, | ||
cluster_fixture, | ||
gen_cluster, | ||
loop, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following #4888, all pytest fixtures in distributed.utils_test
are globally available. So there's no need for the cluster_fixture
or loop
imports (and hopefully we can drop # noqa: F401
too).
Do we have a tracking issue for ongoing CI failures? The two here are not actor related, as far as I can tell. |
There are a variety of issues, generally one per failure
…On Mon, Jun 14, 2021 at 8:05 AM Martin Durant ***@***.***> wrote:
Do we have a tracking issue for ongoing CI failures? The two here are not
actor related, as far as I can tell.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#4232 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBDC2GSCM3V3PIUFRTTSX5AFANCNFSM4TQXARUA>
.
|
Yeah, we keep track of these issues with a |
Co-authored-by: James Bourbeau <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martindurant! Will merge once CI finishes
Follows on from #4225
Closes dask/dask#7626