Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log event print warn #5220

Closed
wants to merge 5 commits into from
Closed

Conversation

mrocklin
Copy link
Member

Alternative to #5218

cc @fjetter

This is just a proof of concept. But it's also pretty simple and close to being comprehensive?

Also add log_event special cases for these routes
@mrocklin
Copy link
Member Author

So, I added a print function

In [1]: from dask.distributed import Client, print

In [2]: client = Client("localhost:8786")

In [3]: def f(n):
   ...:     for i in range(n):
   ...:         print(i)
   ...: 

In [4]: client.map(f, range(5))
Out[4]: 
[<Future: pending, key: f-38370eeb37b51b7ae1f3469a7be69312>,
 <Future: pending, key: f-96cd58dc4894528e301fcca6fbe4b11a>,
 <Future: pending, key: f-755e3694acfeec50217bd54c9065a639>,
 <Future: pending, key: f-97133af7e5f1d7262b38da436e9026aa>,
 <Future: pending, key: f-294a1b6eb61b4d1b2636303bd0067d6a>]

0
0
1
0
0
1
2
3
1
2

@gjoseph92
Copy link
Collaborator

I love the idea of making print statements easy to use remotely. https://buttondown.email/geoffreylitt/archive/starting-this-newsletter-print-debugging-byoc/ was high up on HN recently for good reason.

How crazy would it to be to replace builtins.print with distributed.print on workers while deserializing user code? So existing code with plain print in it would print in the console with no changes, and when new users added print in their functions, it just worked, without having to learn about our special print function?

@mrocklin
Copy link
Member Author

How crazy would it to be to replace builtins.print with distributed.print on workers while deserializing user code? So existing code with plain print in it would print in the console with no changes, and when new users added print in their functions, it just worked, without having to learn about our special print function?

There are two questions here:

  1. Should we?
  2. How could we?

Skipping over 1 for a moment, an alternative would be to replace sys.stdout with something that tee's off. I think I've done something similar in the past with fairly positive effects. I'm going to suggest that we pause on the broader question for now though.

@jrbourbeau
Copy link
Member

jrbourbeau commented Aug 16, 2021

Cross-referencing #4502 which was an earlier attempt by @jsignell to forward stdout to the distributed.worker Python logger

EDIT: This isn't exactly the same as what's going on in this PR, but there is useful user feedback in the other PR which might inform the changes here

Comment on lines +7422 to +7427
if name == "print":
for comm in self.client_comms.values():
comm.send({"op": "print", "message": msg, **kwargs})
if name == "warn":
for comm in self.client_comms.values():
comm.send({"op": "warn", "warning": msg, **kwargs})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that we're logging the same message under multiple topics (e.g. name=["model_score", "print"]) we won't send print or warn ops to the client. I'd suggest modifying the logic above to be something along the lines of

if not isinstance(name, list):
    name = [name]
for n in name:
    ... # logic in the current `else:` branch here 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went the other way, and called self.log_event(...) a few times, but I agree that not duplicating the logic twice is a good idea.



def test_print_simple(capsys):
from dask.distributed import print
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this import will impact other tests which are run later. I'm wondering if we should use

from dask.distributed import print as dask_print

to not override the builtin print function for later tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we're ok here. I think that imports are properly scoped

In [1]: def f():
   ...:     from dask.distributed import print
   ...: 

In [2]: print
Out[2]: <function print>

In [3]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type:      builtin_function_or_method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to call `f() there, but I just did it again and it's fine.

@fjetter
Copy link
Member

fjetter commented Aug 17, 2021

The likely better reference is #5217 where I'm also using plain stream handlers. the most relevant distinction to this PR is that

  1. I do not hard code specific events/topics for forwarding but let the user opt-in by subscribing. this is a very, very simple pubsub schema which only includes scheduler and client and doesn't involve much state. This has the benefit that it is not restricted to specific magic topics but one can also listen to all kinds of events.
  2. I do not hard code printing

With #5217 we can achieve the exact same behaviour as proposed by this PR (excluding the distributed.print for a second) with a single line of code

client.subscribe_topic("print", lambda: msg: print(msg))

Considering the initial problem we wanted to solve, namely to somehow forward server side exceptions to clients this PR would only offer us the possibility to force these prints/warnings onto the user but I am strongly against forcing verbose print statements on users. Especially considering that every worker would be eligible to issuing such a print/warning and we have exceptions which might actually be OK (Most of the network problems are / should be recoverable) but large deployments had the potential to become overly verbose.
Thinking further about production level environment instead of debugging environments I would actually never want to have any print functions since they cannot be filtered, configured, typically are not persisted or forwarded like the stdlib logging allows.

@mrocklin
Copy link
Member Author

I'm happy to go with #5217 . I agree that it is more general. I do think that setting a couple of behaviors like print/warn by default will be well appreciated. I think that this comes to whether or not we're thinking about interactive/novice or regular/experienced users.

Remote printing has been a common feature request over the years. Folks are opting into it with the import, so it seems safe to do to me.

@fjetter
Copy link
Member

fjetter commented Aug 17, 2021

Remote printing has been a common feature request over the years. Folks are opting into it with the import, so it seems safe to do to me.

I wasn't aware of that feature request but I'm not opposed. I was working on this topic in the context of exception forwarding, not remote printing. There are subtle differences between the two, e.g. print is triggered by the user while an exception is issued by us. that changes the "opt-in semantics" a bit.

I do think that setting a couple of behaviors like print/warn by default will be well appreciated. I think that this comes to whether or not we're thinking about interactive/novice or regular/experienced users.

If we go with the more general approach in #5217 we can also settle on a few "default subscriptions" which include print and warning together with the distributed.print and take the best of the two worlds.

@mrocklin
Copy link
Member Author

Sounds good to me

@jrbourbeau
Copy link
Member

The changes here have been included in #5217, so I'm going to close this PR out

@jrbourbeau jrbourbeau closed this Aug 17, 2021
@mrocklin mrocklin deleted the log-event-print-warn branch August 17, 2021 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants