Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in elasticsearch input plugin #2954

Merged
merged 2 commits into from
Jun 26, 2017

Conversation

bobmshannon
Copy link
Contributor

@bobmshannon bobmshannon commented Jun 23, 2017

This introduces a fix for #2893. With this change, ES collection is skipped for the interval rather than telegraf crashing due to panic.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Fix lint

Re-trigger build

Tweak

Fix lint

Tweak

Tweak
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the bugfix. Why would the /_cat/master not be available? I ask because maybe we can tell the user how they can fix this in the error.

e.setCatMaster(s + "/_cat/master")
if err := e.setCatMaster(s + "/_cat/master"); err != nil {
acc.AddError(fmt.Errorf("Unable to retrieve master node information."))
acc.AddError(fmt.Errorf(mask.ReplaceAllString(err.Error(), "http(s)://XXX:XXX@")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a single error, we would need to have a single AddError call, but since it is fatal just return the error.

@danielnelson danielnelson added this to the 1.3.3 milestone Jun 23, 2017
@bobmshannon
Copy link
Contributor Author

@danielnelson /_cat/master (and other endpoints, like /_cluster/health) will return a 503 in cases where the cluster is unable to elect a master node for some reason. Based on my experience this can be caused by a variety of different things, including but not limited to a misconfiguration usually when first setting up ES, or if the master node(s) have become unresponsive due to some other networking or systems related issue.

But with that being said, there could be other reasons as to why the endpoint would return a non-response that I may be unaware of. ES always returns an error message though, so maybe we could just log that? Something like this:

Error in plugin [inputs.elasticsearch]: Unable to retrieve master node information from http(s)://XXX:XXX@/_cat/master. Received status code 503 but expected 200. The error response from server is show below:
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

@bobmshannon
Copy link
Contributor Author

@danielnelson I've removed the extra log entry and replaced it with a single one of the following format:

elasticsearch: Unable to retrieve master node information. API responded with status-code %d, expected %d

which I believe is more in line with the style of the existing error messages.

@danielnelson danielnelson merged commit a7595c9 into influxdata:master Jun 26, 2017
danielnelson pushed a commit that referenced this pull request Jun 26, 2017
@danielnelson
Copy link
Contributor

Will be in 1.3.3 later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants