Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Report RabbitMQ Close Errors on Staging #1325

Closed
wants to merge 2 commits into from

Conversation

rsandor
Copy link
Contributor

@rsandor rsandor commented Feb 2, 2016

This PR removes error reporting in the RabbitMQ.prototype.close when the server is running with NODE_ENV=staging.

The method was reporting quite a few (~6k / week) errors when called on staging. It appears that the server was never started and then a close was executed (via a SIGINT or SIGTERM, as far as I can tell).

While we will need to keep error reporting here so as to indicate a problem with our workers in production, it is a bit superfluous to have all this reporting for staging (where we just restart the container if it is mucking up).

Reviewers:

@@ -176,6 +176,10 @@ RabbitMQ.prototype.close = function (cb) {
}
this.hermesClient.close(function () {
log.trace('RabbitMQ.prototype.close complete')
if (process.env.NODE_ENV === 'staging') {
// Ignore errors on staging close (to avoid alerting)
return cb.call(this)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cb.call(this) -> cb()?

@podviaznikov
Copy link
Member

it seems that it tried to close rabbitmq connection in rabbitmq and failed because connection was already closed. So maybe a better fix would be to handle that in hermes? If .close was called and connection was already closed it shouldn't callback with an error

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

@podviaznikov - While that is true, we also never want any errors to be reported in rollbar for staging hermes close. This is, then, a blanket fix to ensure we don't have staging reporting for this particular use case.

@Myztiq
Copy link
Contributor

Myztiq commented Feb 2, 2016

Do we even want to report to rollbar on staging?

@podviaznikov
Copy link
Member

What about scenario when you testing something on staging and you want to see rollbar errors?

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

We don't, but that is a bit of a bigger PR. This fix will help us in the short term.

@Myztiq
Copy link
Contributor

Myztiq commented Feb 2, 2016

This LGTM, I'd love to see if we could nix reporting entirely on staging... As it's really not useful. I can't see a case where reporting to rollbar when testing something couldn't be done by reading the logs manually.

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

@podviaznikov - I am very skeptical that you would be using rollbar for error tracking in the staging environment. My guess is that you would be testing (hopefully soon in isolation) and would be tracking errors in the logs themselves.

Rollbar is a monitoring tool that allows you to understand the error behavior of a relatively stable system (such as a production server, etc.) over time. Thus it makes very little sense to use it for real-time testing in a staging environment.

@podviaznikov
Copy link
Member

what about testing branches over time. E.x. you have a branch and you run some automated tests over that branch during night. Or you do perf tests and want to see in rollbar if anything wrong happened

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

@podviaznikov - Do you currently perform this sort of testing?

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

@podviaznikov - Furthermore, if you needed to do that you should be able to change the reporting behavior in your branch.

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

@podviaznikov - Just a thought on that, I think we could have a env variable that you could override in your runnable config to tell it whether or not to report staging errors to rollbar. So the default would be to not do so, and then if you needed to keep something more long running you could always switch it back on for your branch. #compromise #politics #thegreatdebate

@podviaznikov
Copy link
Member

  1. how would I be able to change reporting behavior on a branch in this case?
  2. I don't do that testing now. But I do check rollbar errors for gamma when I'm deploying new PR (gamma/beta was equivalent for me to staging in the last couple of month). It's especially good because I can see errors caused by me in rollbar

I'm not against of this PR (especially because it's a small one) but I do have few concerns:

  1. adding ENV specific code is strange
  2. underlying problem seems to be either in hermes or elsewhere but not in this particular piece that was changed. I'd rather have proper fix
  3. if proper fix is to disable rollbar on staging - I'm also good with that if we have discussion

Because 2, 3 this PR seems like not here not there to me. Unless we are loosing a lot of money on rollbar I'd rather see full fix. But again, this is just and opinion and I'm not against merging it

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

Let me refer you to...

Exhibit A:
screen shot 2016-02-02 at 2 43 43 pm

Exhibit B:
screen shot 2016-02-02 at 2 43 58 pm

@rsandor
Copy link
Contributor Author

rsandor commented Feb 2, 2016

Note how we don't reset on usage until the end of the month 😢

@podviaznikov
Copy link
Member

That is why I said Unless we are loosing a lot of money on rollbar. But 6K from 200K doesn't look that-that much. Is it 6K per day or in total for that event?

My concern is that you are introducing temporary code. Who is going to remove that code? Do we have JIRA task to have proper implementation and remove this piece you are adding now

@rsandor
Copy link
Contributor Author

rsandor commented Feb 3, 2016

Okay, I agree. Let's work a plan about fixing this for all issues. Closing.

@rsandor rsandor closed this Feb 3, 2016
@rsandor rsandor deleted the staging-rabbit-close-error branch February 3, 2016 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants