-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too many open files - and consul fills up log #688
Comments
In addition, consul does not (cannot?) pick a new leader if the leader is the one that is out of file descriptors. In our scenario we have 50 some agents, some of which are connected to the leader once, others 100 times. Haven't figured out why yet. |
@balexx @dpkirchner There was a regression that caused a lot of connections to be maintained, which is now fixed in master for the 0.5 release. Sorry about that! |
We just had the same issue. Great that it's fixed! @armon could you please link to the commit or PR that fixes it? |
Ok, thanks! |
We're running a master build now and the problem has gone away. Sweet. |
I'm still experiencing this issue with 0.5.0, any other updates on a fix? |
@cglewis Can you provide more information? This should be a fixed issue as far as we know |
@armon I'm running on Ubuntu 14.04.1 with consul 0.5.0 and seeing the same error message as at the top of this issue. what other information would be helpful to you? for now I just increased the allowed open file count and rebooted the box. |
To make sure I understand this - The standing issue is that Consul floods the logs when file handles are exhausted, not that Consul itself is leaking resources, correct? It sounds like @armon has fixed @dpkirchner's resource leaking problem, but if any program exhausts file handles Consul fills the disk with logs. @cglewis are you just redirecting consul's log output to a file? If you use a syslog server, most often times there is a coalescer built in (like rsyslog's message reduction). |
@ryanuber so I only have two things running on that machine, and the former has been running fine for months without file handle issues, but only after adding consul agent to it has it started to have this issue. I don't care as much about the logs filling up, as that can easily be filtered, I was more concerned that Consul seems to be the culprit for the resource leak, though I can't verify that specifically, as I haven't isolated it yet. |
@cglewis How large is your cluster? There are a number of fixes in master now for causing excessive socket usage on large clusters. 0.5.1 should improve that situation. There is definitely an issue of excessive logging as well if we exhaust file handles, but the answer there is to just crank ulimit way up for now. |
I am still seeing this on |
@ChrisMcKenzie are you still experiencing this? The issue of aggressive logging when file handles are exhausted still exists, and it's a difficult-to-detect situation from inside of the application. The solution for now is to set high limits for ulimit, or using a syslog server to leverage its log throttling abilities, such as rsyslog. The latest release should contain the fixes mentioned above by @armon so this should be less of a problem now. |
I am seeing this issue this morning. I have 3 consul servers running 0.6.0 and my agents are still 0.5.2. My consul logs (see below) are logging a million "too many open files" and my disk space is exhausted. The consul process is the one with all the files open (if I "ls" /proc//fd, I see 1024 items, which is what my ulimit is). Any ideas? Is it the agent/server version mismatch? This happened on 2 servers today. Logs from consul.log: ==> Log data will now stream in as it occurs:
==> Newer Consul version available: 0.6.0 |
Hi @samprakos the mixed Consul versions should be fine. Can you run |
Sure...here is the output from that command (attached): |
Hmm - these are very weird: |
Ah...great suggestion. I have tied the large number of open files to a particular process. It is a go program that is monitoring the health of all services on the consul agent. it is using the consul Go API. I unfortunately can't see where in the API we would call Close() or similar. Here is our code: ...
client, err := getClient(consulAddress)
if err != nil {
return checks, err
}
agentChecks, err := client.Agent().Checks()
if err != nil {
return checks, err
}
...
func getClient(url string) (*api.Client, error) {
config := api.DefaultConfig()
config.Address = url
return api.NewClient(config)
} |
Ideally, I think you'd make Another solution would be to modify There's a |
Thanks that gives us something to go on. Setting "DisableKeepAlives: true" didn't solve the issue for some reason so still working through it. |
@samprakos didn't realize this was already open but this is probably a good fix - #1499. I'll do a little more digging and probably merge this tonight - this would let your existing code work as-is. |
Good news...I'll follow #1499 |
@samprakos even better this is being fixed in the upstream library in hashicorp/go-cleanhttp#2 - once that's done you should be able to just update go-cleanhttp and be good to go with no changes to your code. |
Whenever a system reaches a situation with "too many open files" (which can happen very fast in some scenarios), consul starts throwing these errors into the logfile:
The errors repeat tens (hundreds?) of times a second, filling up the filesystem before anyone can react. It would really be useful if repeated messages were aggregated, negating this issue.
Alternatively, logging this at a much lower frequency would also help.
The text was updated successfully, but these errors were encountered: