-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch log formatter errors to avoid death spiral of logging (issue #2967) #2968
Conversation
Hi, thanks for the PR! Not merging it yet though. We considering to fix the issue the following way:
Apparently, we would also have to investigate the running with debug log-level to get the original log event that caused crash, you've reported. |
Thanks for taking a look @arcusfelis
I think it's reasonable to do the first two - perhaps I could raise a separate issue to cover them? I would say that from a robustness perspective, there really needs to be a backstop here, like the try/catch I've introduced. As you say, there are many possible kinds of errors and sources of errors, and the consequences of one of them slipping through the net are so severe (loss of service - the worst possible failure) that a catch-all is justified in my opinion. So, my preference would be to add the backstop first, to gain robustness and stability, and then raise a separate issue for the underlying formatter errors.
This could be raised as a further issue - errors while formatting specific log statements from the big tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes point. If we want to merge this, we have to do at least two things in the comments :)
catch | ||
%% Exceptions during log formatting can lead to a death spiral of recursive error logging, so | ||
%% format the formatter error in a safe way and don't allow the exception to propagate. | ||
Class:Reason -> format_log_formatter_error(Class, Reason, Event, FConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to catch with catch error:Class:Stacktrace
.
- Don't ignore stacktrace when catching - it would pretty hard to debug without line information.
- ignore throw and exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion - added
end. | ||
|
||
|
||
format_log_formatter_error(Class, Reason, #{meta := #{time := Time, mfa := Mfa, file := File, line := Line}}, FConfig) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to include the complete Event, encoded as a term.
original_event => unicode:characters_to_binary(io_lib:format("~0p", Event))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this and pushed new changes
We also planning to make a separate PR from our side with a fix of one of OTP errors, that would include tests - that PR could be used as a template for the rest of errors. I am pretty fine with merging |
I've updated the fallback error handlers in response to @arcusfelis's comments. I think I've resolved the underlying formatter crash cause as well. |
#{unstructured_log => | ||
unicode:characters_to_binary(io_lib:format(Format, Terms))}}}, | ||
FConfig). | ||
|
||
|
||
-spec append_lf(iodata()) -> iodata(). | ||
append_lf(IoData) when is_list(IoData) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just this is enough:
append_lf(IoData) ->
[IoData, <<"\n">>].
I am dialyzer:
-spec format(logger:log_event(), logger:formatter_config()) -> unicode:chardata().
-spec append_lf(iodata()) -> iodata().
These two return different types.
Log formatters return unicode:chardata()
.
Should be:
-spec append_lf(unicode:chardata()) -> unicode:chardata().
(tldr: iodata - integers are bytes, unicode - untegers are code points).
There is no problem with the code, there is a problem with a spec :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the change - got rid of the function because it didn't seem worth having it. That means the spec has gone as well.
The old code always returned the iodata()
from jiffy:encode
as the unicode:chardata()
return value of format
. If I understand the typing correctly, iodata()
is the same as unicode:chardata()
except when it's an (improper) list containing integers >=128, however there won't be any of those because the list returned from jiffy:encode
consists of UTF-8 binaries. So the semantic difference between iodata()
and unicode:chardata()
isn't a problem in this particular case (I've resisted the urge to write a code comment to this effect).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the only difference between unicode and iodata is when lists of integers are used.
Still, it's a good idea to avoid iodata()
type in the unicode code, because currently it's fine, but after 3-4 generations of code modifications, there could be a point, when someone would add iolist_to_binary(X)
and another person would add io_lib:format("~ts", [Info]) (which returns code points higher 255)
in two parts of the code :D
Anyway, awesome contribution :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
This PR addresses #2967
Instead of allowing an error during log line formatting to propagate into
formatter_crashed
, which can then lead to a death spiral of formatters crashing as they try to format theformatter_crashed
error, safely format a log message about the formatter crash.This avoids the big tests failing at debug log level, and will also be safer for production use - any undiscovered bugs in log formatting won't be able to bring down the MongooseIM process.
Example of formatted log line produced by the modified JSON formatter in case of an error during formatting:
Closes #2967