-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.6+ Round-Robin DNS Entry forward any_of causes hung connections #262
Comments
For 1. I'm searching, sofar no clue why you see that behaviour yet. |
As for 1. I think it's an inverted logic bug. |
…lved This may be causing what is observed in issue #262. Due to inverted logic, connections bound to be re-resolved were actually re-resolved, but the info was never used, and addrinfo was leaked. On resolve failure this could even result in errors or other undefined behaviour.
Due to a bug, connections would never switch to another IP, memory would be leaked though. |
Validated that the inverted logic fix does now rotate ips. As for 2, my thought was if a RR DNS was used in an ANY_OF entry could we pull all the ip's from the dns entry and put them in as individual members. That way it would split the traffic between them but would allow for a node being down. This would allow a centralized mgmt of the endpoints vs having all the ip's in the remote relays. |
Ok, so there's something going on there. What you describe in 2 seems exactly like the
From what you tell me in this mode the relay should not build up a large pile of connections. If this is indeed the case, it should narrow the search somewhat, but I'm curious... |
Agreed it looks like useall is what I was looking for there. Thank you so much. As for the other issue that only made it quite worse, after 1 min I had 240 established connections. |
The idea is that a connection is made, and reused when there are metrics to write within a certain timeout (something like 10s off the top of my head). It should absolutely NOT open a new connection for each time it tries to write. |
I just did a simple test to verify the disconnect behaviour, and it seems to trigger (it's 3 seconds). Can you tell me a bit about how many addresses your multiipdnsentry resolves to, and how much data is flowing towards the relay? If you use the stats, how much connections are made to the relay (nonNegativeDerivative(carbon.relays.host.connections)), and what are the other relays? Are they also c-relays, or different software? |
Sure, this is a per host relay to relay setup. Interesting side note. If I set the DNS entry with useall it expands to the IP's as members in the log when it outputs the config. This has the connection issues though. If I copy / paste that config and use it there's no problems. Is there a way to adjust the disconnect timeout? |
think I found the problem |
…solve Part of what's reported in issue #262, a server that's the result of use_all expansion should be treated as if it were given as IP address.
If you could try latest master, that would be awesome. If it solves the problem for you, I'll release v3.1 shortly to fix this screwup. |
Centos 6 fails to make from master Centos 7 completes. example: |
You can touch conffile.tab.* and conffile.yy.* for git doesn't store mtimes :( I haven't found a way to work around this yet. I'll look into why useall doesn't connect to others. |
hah, use_all never updates the configuration, so the router thinks there's only one entry. |
hmmm, test mode shows all entries would get used ... |
I've not been able to reproduce the behaviour where it will pick the first node. That actually is the behaviour of a |
I think I found a reason/cause for the behaviour you see. |
This is likely the problem observed in issue #262 where the first address is taken all the time, because the full stack of addresses were assigned to every server.
I think I've fixed this, if not, please reopen. |
On 2.5 the functionality was that it would pick 1 ip from the DNS entry to forward to and stick with it.
In 2.6 and 3.0 it will connect to 1 of the ip's from the DNS entry and send metrics then leaves the connection open.
This leads to ever increased connection counts until it runs out of file descriptors.
So 2 prong request would be
[2017-04-21 17:26:56] (MSG) starting carbon-c-relay v3.0 (98424a-dirty), pid=39656
configuration:
relay hostname = server_name
listen port = 2003
listen interface = 127.0.0.1
workers = 4
send batch size = 2500
server queue size = 25000
server max stalls = 4
listen backlog = 32
server connection IO timeout = 600ms
debug = true
configuration = /etc/carbon-c-relay.conf
parsed configuration follows:
statistics
submit every 60 seconds
prefix with carbon.relays.server_name
;
cluster local_carbon
any_of
multiipdnsentry:2003
;
match *
send to local_carbon
stop
;
[2017-04-21 17:26:56] (MSG) listening on tcp4 127.0.0.1 port 2003
[2017-04-21 17:26:56] (MSG) listening on udp4 127.0.0.1 port 2003
[2017-04-21 17:26:56] (MSG) listening on UNIX socket /tmp/.s.carbon-c-relay.2003
[2017-04-21 17:26:56] (MSG) starting 4 workers
[2017-04-21 17:26:56] (MSG) starting statistics collector
[2017-04-21 17:26:56] (MSG) starting servers
/usr/bin/carbon-c-relay -P /var/run/carbon-c-relay/carbon-c-relay.pid -D -p 2003 -i 127.0.0.1 -w 4 -b 2500 -q 25000 -l /var/log/carbon-c-relay/carbon-c-relay.log -s -f /etc/carbon-c-relay.conf
The text was updated successfully, but these errors were encountered: