Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monkey patch log_error - format errors on one line #147

Merged
merged 7 commits into from
May 15, 2020

Conversation

richardTowers
Copy link
Contributor

It's annoying having Rails log errors across multiple lines, as it makes
indexing them in logstash very difficult.

⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒
This PR involves monkey patching a small bit of rails
⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒 ⚠️ 🐒

The ActionDispatch::DebugExceptions middleware is the culprit that's logging
errors across multiple lines. Since it would be a bit of a nuisance to have to
replace this middleware entirely (it does other things too - like rendering
error pages in dev), it seems more "surgical" just to replace the log_error
method with one that does what we want.

We've tried to be very careful to check that doing this doesn't break anything,
and we think that the test coverage we're adding should catch any regressions.

At the end of the day, if Matz had not wanted us to do bad things, he would not
have given us free will.

https://trello.com/c/fX8U5tly/1911-5-show-stacktraces-as-one-log-entry-in-kibana-%F0%9F%8D%90

callumknights and others added 3 commits May 14, 2020 19:18
This test shows that errors raised by the application are logged by this
middleware, but it doesn't make any assertions about their format
(because we haven't fixed that yet).

Co-authored-by: Richard Towers <[email protected]>
... instead of just the middleware
Copy link
Contributor

@thomasleese thomasleese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one thing which is that it’s missing an entry in the changelog, but apart from that I think it’s looking good.

Copy link
Contributor

@ChrisBAshton ChrisBAshton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done for getting to a solution 👏 this had me stumped!

Just a couple of comments below.

spec/lib/govuk_logging_spec.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@ChrisBAshton ChrisBAshton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments 👍 looks good to me!

It's annoying having Rails log errors across multiple lines, as it makes
indexing them in logstash very difficult.

Instead, we can 🐒 patch the private log_error method in
ActionDispatch::DebugExceptions so that the log format is a bit nicer.
We're logging errors as JSON because it's easy to do, and should be easy
to parse later in the logstash filters.

It's worth us being very careful not to monkey patch this method if it
doesn't look the way we expect it to - it's not part of the public
interface of rails, so it could change in a future version without
warning. To help avoid this, `should_monkey_patch_log_error` check that
the method we're patching has the expected set of parameters, and we
won't replace the method unless it does.

There's also a test in this gem that now checks that errors raised by
the application are logged as JSON, so that should catch any regressions
in future versions of ActionDispatch.
@richardTowers
Copy link
Contributor Author

Just going to test that this actually works with one of the apps before merging it.

@richardTowers
Copy link
Contributor Author

whitehall-app_1                         | [d0048485-bd5e-444b-a350-5d9dfc31f216] {"exception_class":"Exception","exception_message":"Feck","stacktrace":["app/controllers/admin/editions_controller.rb:89:in `new'"]}

🎉 :shipit:

@richardTowers richardTowers merged commit 56ae9cf into master May 15, 2020
@richardTowers richardTowers deleted the fix-rails-logging branch May 15, 2020 09:59
@kevindew
Copy link
Member

@richardTowers sorry to be that person that comments once this is merged and deployed. I just saw this when in the repo for a different reason and a couple of things came to mind:

  1. I think this is going to cause some rather ugly things for logs in dev/test environments where we've not got JSON formatted log files
  2. I think the information we get from this logging is actually superfluous

On the 2nd point, we already get a stack trace from logstasher inside the logged request and we do also send the trace to Sentry. So I'm actually wondering if what we actually want to do is just skip the DebugExceptions middleware when we're in a production environment?

@richardTowers
Copy link
Contributor Author

Hi @kevindew - no worries, thanks for your comment:

  1. GovukLogging.configure is only run if Rails.env.production?, so dev / test environments should be unaffected. I guess it could be worth being more explicit about only monkey patching things in production, but I'm reluctant to make it even more difficult to test prodution-like logging locally.

  2. If we're confident that this is the case, then I agree with you - it would be better just to remove the middleware to reduce the log noise. I'm not sure logstasher is getting all the errors that this is logging though (perhaps Sentry is, but that's besides the point as they should be logged to ELK at least once). For example this search for ActionNotFound in whitehall only shows messages from this middleware: https://kibana.logit.io/s/2dd89c13-a0ed-4743-9440-825e2e52329e/goto/dfc466778ff0e9a1f51a16b1fc594957

@kevindew
Copy link
Member

Hi @kevindew - no worries, thanks for your comment:

  1. GovukLogging.configure is only run if Rails.env.production?, so dev / test environments should be unaffected. I guess it could be worth being more explicit about only monkey patching things in production, but I'm reluctant to make it even more difficult to test prodution-like logging locally.

Argh sorry I missed that nuance. That's a relief 👍

  1. If we're confident that this is the case, then I agree with you - it would be better just to remove the middleware to reduce the log noise. I'm not sure logstasher is getting all the errors that this is logging though (perhaps Sentry is, but that's besides the point as they should be logged to ELK at least once). For example this search for ActionNotFound in whitehall only shows messages from this middleware: https://kibana.logit.io/s/2dd89c13-a0ed-4743-9440-825e2e52329e/goto/dfc466778ff0e9a1f51a16b1fc594957

Yeah, that ties into an annoying quirk of Rails/Logstasher unfortunately. Only once a request reaches ActionController is it logged (issue based around similar pain point) which means that request is not logged at all for the app (you'd expect it to be logged as a 404 but you can only find out that it served this based on the nginx log). It'd be lot better if it logged everything rack served

This type of error is ignored by Sentry as it's just something where Rails is behaving as it should. The routing errors are annoying as you'd never want a full stack trace but you'd always want it logged as a request.

@richardTowers
Copy link
Contributor Author

I'm still not sure it's logging everything, what about these NoMethodErrors I can see in the DebugExceptions logs:

https://kibana.logit.io/s/2dd89c13-a0ed-4743-9440-825e2e52329e/goto/f9daba3684136ebae9654d1b54b7aba7

I can't see any equivalent errors elsewhere in ELK.

@kevindew
Copy link
Member

I'm still not sure it's logging everything, what about these NoMethodErrors I can see in the DebugExceptions logs:

https://kibana.logit.io/s/2dd89c13-a0ed-4743-9440-825e2e52329e/goto/f9daba3684136ebae9654d1b54b7aba7

I can't see any equivalent errors elsewhere in ELK.

I can see these in looking at requests with 500 responses: https://kibana.logit.io/s/2dd89c13-a0ed-4743-9440-825e2e52329e/goto/b21d2de51756addb58bef65ccf821500

are you filtering on message? as they appear in the error field

Bit worrying how many frequent errors there are to compare on 😬

@richardTowers
Copy link
Contributor Author

Ah, I was just searching for "NoMethodError", which for some reason brings my results up but not yours - perhaps Elasticsearch is indexing splitting on words for message (where my "NoMethodError" appears) but not for error (where yours does). How does ELK even work 🤷.

@kevindew
Copy link
Member

Argh - why is everything broken? It doesn't seem to even know about the field. If I query for any records where error exists it finds nothing. It seems to think the field is both searchable and a string 😖

@richardTowers
Copy link
Contributor Author

Yeah, I did some more digging in ElasticSearch and the difference above is because error has {"type": "keyword", "ignore_above": 1024} whereas message has {"type": "text", "norms": false}.

@richardTowers
Copy link
Contributor Author

I don't actually know how to configure that though - it seems to only be possible through the elasticsearch mappings API.

@kevindew
Copy link
Member

Oh cool, nice find 👍

Following that and a bit of a Google hole it looks like this might be the sort of approach to change it: https://help.logit.io/en/articles/3478456-how-can-i-make-a-text-field-in-elasticsearch-and-kibana-aggregatable. From looking at our filebeat template:

        "dynamic_templates": [
          {
            "strings_as_keyword": {
              "mapping": {
                "ignore_above": 1024,
                "type": "keyword"
              },
              "match_mapping_type": "string"
            }
          }

Seems that's applied by default, whereas there's a specific section for message:

          "message": {
            "norms": false,
            "type": "text"
          },

I'm guessing we're just using the default templates though, rather than we have code that configures them somewhere 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants