RFC-0026-logging-system #43

kurtamohler · 2022-06-22T21:21:25Z

RFC for a consistent C++/Python message logging system in PyTorch

Feature was requested in pytorch/pytorch#72948

albanD

Thanks for writing this down!
For the warning/error case, I'm curious what is the discrepancy / missing part in the current system?

RFC-0026-logging-system.md

mruberry · 2022-06-27T15:16:02Z

Seems like a good proposal in general, @kurtamohler. I think it'd be helpful to expand this to the next level of detail and show how some of these scenarios would work, and how errors and warnings can be originated in both Python and C++

kurtamohler · 2022-07-05T22:02:44Z

Would it be at all helpful if I include a somewhat full description (or links to documentation when possible) for the current message APIs in PyTorch? I am going to write notes on it regardless for my own reference, and I can include it here if it's a good idea.

EDIT: My notes on the current messaging APIs are here: https://github.com/kurtamohler/notes/blob/main/pytorch-messaging-api/current_messaging_api.md

mruberry · 2022-07-11T19:53:22Z

Would it be at all helpful if I include a somewhat full description (or links to documentation when possible) for the current message APIs in PyTorch? I am going to write notes on it regardless for my own reference, and I can include it here if it's a good idea.

EDIT: My notes on the current messaging APIs are here: https://github.com/kurtamohler/notes/blob/main/pytorch-messaging-api/current_messaging_api.md

Yep, that's a great idea

kurtamohler · 2022-07-14T02:59:14Z

I've added a lot more detail, including a description of how message logging is currently done in PyTorch, and fixed some of the things brought up in discussion so far.

I haven't described all of the APIs for creating messages in the new system yet--I will do that next

mruberry · 2022-07-14T15:48:02Z

I've added a lot more detail, including a description of how message logging is currently done in PyTorch, and fixed some of the things brought up in discussion so far.

I haven't described all of the APIs for creating messages in the new system yet--I will do that next

Hey @kurtamohler! FYI I'll be on PTO for the next several weeks, and I think @albanD is on PTO currently. So this may take some time to respond to.

albanD

That looks great! Thanks for taking the time to write all of this down

RFC-0026-logging-system.md

albanD · 2022-07-22T12:42:14Z

RFC-0026-logging-system.md

+
+* **`c10::BetaWarning`** - Python `torch.BetaWarning`
+  - Emitted when a beta feature is called. See
+    [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/).


I'm not sure about this one tbh the page is 2+ years old and doesn't mention warning (also it is inacurate in saying that prototype features are not in releases)?
cc @gchanan do you have a strong opinion on this?

While I could see this being useful for some features for which we do expect to break BC, I think many beta/prototype features are just "not mature enough" and don't need such warnings to avoid too many warnings?

Yeah, I can see your point, maybe beta/prototype warning classes aren't a good idea. There are some warnings currently being emitted in these cases, so we will have to choose which warning type they use. For instance: https://github.com/pytorch/pytorch/blob/56dea92d973ff16b41cb812b69c5a7ab8c9d1109/torch/csrc/cuda/Module.cpp#L566-L568

Should we just use the default UserWarning class for beta/prototype features that currently emit a warning, or provide something that's just a little more specific, like maybe UnstableFeatureWarning?

RFC-0026-logging-system.md

albanD · 2022-07-22T13:01:59Z

RFC-0026-logging-system.md

+### C++ to Python Warning Translation
+
+The conversion of warnings from C++ to Python is described [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L25-L48)
+


Is there anything notable about the current Info system? it is the one I'm least familiar with and I am a bit curious :D

albanD · 2022-07-28T10:22:05Z

RFC-0026-logging-system.md

+
+#### Info message classes:
+
+* **`c10::Info`** - Python `torch.Info`


Thinking more about what would be the use case for this, I don't think that this should be used within core in general.
The only use case that I could see is a particular submodule that is expected to do some logging (distributed runner, benchmark tools, etc).
What do you think about forbidding to use the base Info class and you are only allowed to use subclasses (DistributedInfo, BenchmarkInfo, etc). That will reduce the chances of having unrelated logs interleaved that the user cannot silence easily.

I like that idea

edward-io · 2022-07-30T05:08:18Z

we have torch.monitor (#30) that hasn't seen too much usage, perhaps we can use some of it for the new logging system.

kumpera · 2022-08-02T20:34:31Z

How is one supposed to selectively change the log verbosity of a specific subsystem?
And how would a subsystem author produces logs that are properly scoped?

While having functions like torch.log_info is nice, it would be great if we could construct logger instances for particular modules. For example:

# distributed.py

logger = torch.Logger("torch.distributed")

def broadcast():
 logger.log_info("broadcasting stuff")

Finally, is there a reason on why we can't integrate with python's logging package so there's very little users need to do to leverage this?

kurtamohler · 2022-08-04T14:55:59Z

@kumpera, good questions.

How is one supposed to selectively change the log verbosity of a specific subsystem?

At the moment, the general idea is that users will use a filter to silence messages. The specific API for that hasn't been written down yet. Although, in the case of warnings, the warnings module already does this. The missing detail is how the user would silence info messages, and I imagine that we would want to have something that has a very similar interface to the warnings module filter.

And how would a subsystem author produces logs that are properly scoped?

Authors would use the appropriate message class. If an applicable message class doesn't exist yet, they should add it.

While having functions like torch.log_info is nice, it would be great if we could construct logger instances for particular modules.

That might be a good idea. What do you think @albanD, @mruberry ?

Finally, is there a reason on why we can't integrate with python's logging package so there's very little users need to do to leverage this?

I don't know much about it, I'll read about it and get back to you

zou3519 · 2022-08-05T14:55:45Z

RFC-0026-logging-system.md

+The APIs for raising errors all check a boolean condition, the `cond` argument
+in the following signatures, and throw an error if that condition is false.
+
+The error APIs are listed below, with the C++ signature on the left and the
+corresponding Python signature on the right.
+
+**`TORCH_CHECK(cond, ...)`** - `torch.check(cond, *args)`
+  - C++ error: `c10::Error`
+  - Python error: `RuntimeError`
+
+**`TORCH_CHECK_INDEX(cond, ...)`** - `torch.check_index(cond, *args)`
+  - C++ error: `c10::IndexError`
+  - Python error: `IndexError`
+
+**`TORCH_CHECK_VALUE(cond, ...)`** - `torch.check_value(cond, *args)`
+  - C++ error: `c10::ValueError`
+  - Python error: `IndexError`
+
+**`TORCH_CHECK_TYPE(cond, ...)`** - `torch.check_type(cond, *args)`
+  - C++ error: `c10::TypeError`
+  - Python error: `TypeError`
+
+**`TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)`** - `torch.check_not_implemented(cond, *args)`
+  - C++ error: `c10::NotImplementedError`
+  - Python error: `NotImplementedError`
+
+**`TORCH_CHECK_WITH(error_t, cond, ...)`** - `torch.check_with(error_type, cond, *args)`
+  - C++ error: Specified by `error_t` argument
+  - Python error: Specified by `error_type` argument


We want to change how asserts are written to support vmap (and/or Tensor Subclasses) over functions that call asserts (e.g. pytorch/pytorch#81767, cc @kshitij12345). In order to support those, the assert should go through the PyTorch dispatcher. Since we are re-thinking how asserts in PyTorch work, this is a good opportunity to figure out how we can have asserts as PyTorch operators.

Some background

folks write functions in C++ like the following:

Tensor safe_log(const Tensor& x) { TORCH_CHECK((x > 0).all(), "requires x > 0"); return x.log() }

If someone is doing vmap(safe_log)(torch.tensor([-1, 1, 2)), then (x > 0).all() is a Tensor that has multiple values. A Tensor that has multiple values cannot be coerced to a single boolean value, and the above errors out.

A sketch of a solution

For synchronous asserts, we would want two extension points. One to actually check the condition, and the other to raise an error message. So TORCH_CHECK(cond, ...) would be implemented like:

if (at::is_true(cond)) { msg = ... at::fail(msg) }

@zou3519, I started looking into adding vmap support to TORCH_CHECK, and I realized that it doesn't actually support tensor inputs at all, not even non-batched tensors. For example, if I do this:

TORCH_CHECK((at::zeros(10).eq(0)).all());

I get this build error:

/work2/kurtamohler/development/pytorch-1/c10/util/Exception.h:506:29: error: no match for 'operator!' (operand type is 'at::Tensor') 506 | if (C10_UNLIKELY_OR_CONST(!(cond))) { \

Without at::Tensor::operator!, we currently have to do something like this instead:

TORCH_CHECK((at::zeros(10).eq(0)).all().item<bool>());

So I like the idea of adding support for tensor inputs to TORCH_CHECK to allow us to avoid the explicit .item<bool>(). This would also be a plus for Python/C++ consistency, since in Python torch.tensor(True) and torch.Tensor(False) are valid conditionals without needing to call .item() on it.

One way, could be to use an at::is_true function, like you mentioned. However, I'm not sure exactly how to do that. Since TORCH_CHECK is defined in c10, it doesn't have access to the at namespace. But of course, since it's a macro, it's just doing substitution, so maybe we could have an #ifdef that checks whether at::is_true is available before using it, like this:

#ifdef AT_IS_TRUE_DEFINED #define CHECK_COND(cond) at::is_true(cond) #else #define CHECK_COND(cond) (cond) #endif

The #define AT_IS_TRUE_DEFINED would be next to wherever the declaration of at::is_true is, and then we can use CHECK_COND in the definition of TORCH_CHECK.

But when I tried this, I couldn't figure out which file to define at::is_true such that its available for any TORCH_CHECK call that might use tensors. It also just seems like kind of a sketchy solution.

Another option is to just define bool at::Tensor::operator!. But this we would have to make it throw an error if the tensor has more than one logical element. Also, in order to make TORCH_CHECK work correctly when cond is a batched tensor, the actual meaning of at::Tensor::operator! would have to be like !physical_tensor.any(). Those seems like somewhat confusing inconsistencies to me. (Although, in the non-batched tensor case, it is actually consistent with Python, since not x will error if x has more than one element.) But maybe it's actually fine, I'm not sure. I think it's at least less sketchy than my AT_IS_TRUE_DEFINED idea.

Maybe there's an overall better way to do this though

cc @albanD

Is there any reason you have to use TORCH_CHECK for this

As I understood it, the suggestion was to make TORCH_CHECK((x > 0).all(), "requires x > 0");, for example, work for both batched and non-batched tensors

I guess, for something like this where we're syncing, I think it is fine if we force the call sites to change what they look like. There's not many of this kind of check.

I'm fine with having something like a TORCH_CHECK_TENSOR that accepts a Tensor to sidestep the issues above. We control what gets written in our C++ internals so we can always refactor if we're not satisfied.

In Python, is it correct that we want torch.check to be public API? If so it needs more scrutiny. My initial thought is that torch.check should do both TORCH_CHECK and TORCH_CHECK_TENSOR (there aren't technical reasons in Python for us to split those up)

I don't think the Python API needs to be public. I can make it private to start with, and we can make it public if we ever find a reason to

kumpera · 2022-08-06T15:06:37Z

And how would a subsystem author produces logs that are properly scoped?

Authors would use the appropriate message class. If an applicable message class doesn't exist yet, they should add it.

Sorry for not being specific here, what I meant by scoping is which subsystem is producing a given log message.

For example, if one is troubleshooting an issue on backwards of a distributed model, they would enable logging for the "autograd" and "distributed" subsystems.

I guess this boils down to whether log messages would carry a subsystem tag that can be used as part of filtering.

…Warning

facebook-github-bot added the cla signed label Jun 22, 2022

kurtamohler force-pushed the logging-system branch from 9dd373a to 339f7f0 Compare June 22, 2022 21:28

kurtamohler mentioned this pull request Jun 22, 2022

Feature: a consistent Python and C++ logging facility that handles different classes of warnings pytorch/pytorch#72948

Open

kurtamohler force-pushed the logging-system branch 4 times, most recently from 9442b54 to 15c24ec Compare June 23, 2022 21:49

Create RFC for a logging system and list some required features

ff90402

kurtamohler force-pushed the logging-system branch from 15c24ec to ff90402 Compare June 23, 2022 21:58

albanD reviewed Jun 24, 2022

View reviewed changes

mruberry reviewed Jun 27, 2022

View reviewed changes

RFC-0026-logging-system.md Outdated Show resolved Hide resolved

kurtamohler force-pushed the logging-system branch from a6ec7d7 to ba89c08 Compare July 5, 2022 20:54

Update requirements from discussion

72797e2

kurtamohler force-pushed the logging-system branch from ba89c08 to 72797e2 Compare July 5, 2022 20:58

kurtamohler force-pushed the logging-system branch from 9222938 to 6883a9b Compare July 14, 2022 02:34

kurtamohler force-pushed the logging-system branch from 6883a9b to 96dcc2c Compare July 21, 2022 23:54

Add more details and add description of existing messaging APIs

d272df6

kurtamohler force-pushed the logging-system branch from 96dcc2c to d272df6 Compare July 21, 2022 23:57

albanD reviewed Jul 22, 2022

View reviewed changes

albanD reviewed Jul 28, 2022

View reviewed changes

More updates from discussion

c769f13

kurtamohler force-pushed the logging-system branch from e9baa60 to c769f13 Compare August 3, 2022 23:25

zou3519 reviewed Aug 5, 2022

View reviewed changes

robieta mentioned this pull request Aug 8, 2022

Add SOFT_ASSERT to gracefully recover from invariant violations pytorch/pytorch#82689

Closed

kurtamohler mentioned this pull request Aug 10, 2022

Only emit distributed launch OMP and deprecation warnings for rank 0 pytorch/pytorch#79288

Closed

albanD mentioned this pull request Oct 10, 2022

Manual changes for moving dynamo to core pytorch/pytorch#86621

Closed

Add c10::DeprecationWarning and fix default warning type to c10::User…

4b75803

…Warning

kurtamohler mentioned this pull request Oct 25, 2022

[Pytorch] add an option to disable TORCH_WARN and TORCH_WARN_ONCE log pytorch/pytorch#87188

Closed

kurtamohler mentioned this pull request Nov 9, 2022

Add torch._check* functions analogous to C++ TORCH_CHECK* pytorch/pytorch#88725

Closed

Make Python functions private

1a859d8

albanD mentioned this pull request Feb 16, 2023

component-level configurable logging for dynamo, inductor, aot pytorch/pytorch#94858

Closed

albanD mentioned this pull request May 10, 2023

logger instead of print in lr_scheduler.py pytorch/pytorch#100847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0026-logging-system #43

RFC-0026-logging-system #43

kurtamohler commented Jun 22, 2022 •

edited

Loading

albanD left a comment

mruberry commented Jun 27, 2022

kurtamohler commented Jul 5, 2022 •

edited

Loading

mruberry commented Jul 11, 2022

kurtamohler commented Jul 14, 2022

mruberry commented Jul 14, 2022

albanD left a comment

albanD Jul 22, 2022

kurtamohler Aug 2, 2022 •

edited

Loading

albanD Jul 22, 2022

albanD Jul 28, 2022

kurtamohler Jul 28, 2022

edward-io commented Jul 30, 2022

kumpera commented Aug 2, 2022

kurtamohler commented Aug 4, 2022 •

edited

Loading

zou3519 Aug 5, 2022

kurtamohler Nov 10, 2022 •

edited

Loading

ezyang Nov 11, 2022

kurtamohler Nov 11, 2022

ezyang Nov 11, 2022

zou3519 Nov 11, 2022

kurtamohler Nov 11, 2022

kumpera commented Aug 6, 2022

		### C++ to Python Warning Translation

		The conversion of warnings from C++ to Python is described [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L25-L48)


		#### Info message classes:

		* `c10::Info` - Python `torch.Info`

RFC-0026-logging-system #43

Are you sure you want to change the base?

RFC-0026-logging-system #43

Conversation

kurtamohler commented Jun 22, 2022 • edited Loading

albanD left a comment

Choose a reason for hiding this comment

mruberry commented Jun 27, 2022

kurtamohler commented Jul 5, 2022 • edited Loading

mruberry commented Jul 11, 2022

kurtamohler commented Jul 14, 2022

mruberry commented Jul 14, 2022

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurtamohler Aug 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edward-io commented Jul 30, 2022

kumpera commented Aug 2, 2022

kurtamohler commented Aug 4, 2022 • edited Loading

Choose a reason for hiding this comment

Some background

A sketch of a solution

kurtamohler Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kumpera commented Aug 6, 2022

kurtamohler commented Jun 22, 2022 •

edited

Loading

kurtamohler commented Jul 5, 2022 •

edited

Loading

kurtamohler Aug 2, 2022 •

edited

Loading

kurtamohler commented Aug 4, 2022 •

edited

Loading

kurtamohler Nov 10, 2022 •

edited

Loading