-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log event print warn #5220
Log event print warn #5220
Conversation
Also add log_event special cases for these routes
c34c738
to
4f4112d
Compare
So, I added a print function In [1]: from dask.distributed import Client, print
In [2]: client = Client("localhost:8786")
In [3]: def f(n):
...: for i in range(n):
...: print(i)
...:
In [4]: client.map(f, range(5))
Out[4]:
[<Future: pending, key: f-38370eeb37b51b7ae1f3469a7be69312>,
<Future: pending, key: f-96cd58dc4894528e301fcca6fbe4b11a>,
<Future: pending, key: f-755e3694acfeec50217bd54c9065a639>,
<Future: pending, key: f-97133af7e5f1d7262b38da436e9026aa>,
<Future: pending, key: f-294a1b6eb61b4d1b2636303bd0067d6a>]
0
0
1
0
0
1
2
3
1
2 |
I love the idea of making print statements easy to use remotely. https://buttondown.email/geoffreylitt/archive/starting-this-newsletter-print-debugging-byoc/ was high up on HN recently for good reason. How crazy would it to be to replace |
There are two questions here:
Skipping over 1 for a moment, an alternative would be to replace sys.stdout with something that tee's off. I think I've done something similar in the past with fairly positive effects. I'm going to suggest that we pause on the broader question for now though. |
if name == "print": | ||
for comm in self.client_comms.values(): | ||
comm.send({"op": "print", "message": msg, **kwargs}) | ||
if name == "warn": | ||
for comm in self.client_comms.values(): | ||
comm.send({"op": "warn", "warning": msg, **kwargs}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that we're logging the same message under multiple topics (e.g. name=["model_score", "print"]
) we won't send print
or warn
ops to the client. I'd suggest modifying the logic above to be something along the lines of
if not isinstance(name, list):
name = [name]
for n in name:
... # logic in the current `else:` branch here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went the other way, and called self.log_event(...)
a few times, but I agree that not duplicating the logic twice is a good idea.
|
||
|
||
def test_print_simple(capsys): | ||
from dask.distributed import print |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if this import
will impact other tests which are run later. I'm wondering if we should use
from dask.distributed import print as dask_print
to not override the builtin print
function for later tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we're ok here. I think that imports are properly scoped
In [1]: def f():
...: from dask.distributed import print
...:
In [2]: print
Out[2]: <function print>
In [3]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type: builtin_function_or_method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot to call `f() there, but I just did it again and it's fine.
The likely better reference is #5217 where I'm also using plain stream handlers. the most relevant distinction to this PR is that
With #5217 we can achieve the exact same behaviour as proposed by this PR (excluding the distributed.print for a second) with a single line of code client.subscribe_topic("print", lambda: msg: print(msg)) Considering the initial problem we wanted to solve, namely to somehow forward server side exceptions to clients this PR would only offer us the possibility to force these prints/warnings onto the user but I am strongly against forcing verbose print statements on users. Especially considering that every worker would be eligible to issuing such a print/warning and we have exceptions which might actually be OK (Most of the network problems are / should be recoverable) but large deployments had the potential to become overly verbose. |
I'm happy to go with #5217 . I agree that it is more general. I do think that setting a couple of behaviors like print/warn by default will be well appreciated. I think that this comes to whether or not we're thinking about interactive/novice or regular/experienced users. Remote printing has been a common feature request over the years. Folks are opting into it with the import, so it seems safe to do to me. |
I wasn't aware of that feature request but I'm not opposed. I was working on this topic in the context of exception forwarding, not remote printing. There are subtle differences between the two, e.g. print is triggered by the user while an exception is issued by us. that changes the "opt-in semantics" a bit.
If we go with the more general approach in #5217 we can also settle on a few "default subscriptions" which include |
Sounds good to me |
The changes here have been included in #5217, so I'm going to close this PR out |
Alternative to #5218
cc @fjetter
This is just a proof of concept. But it's also pretty simple and close to being comprehensive?