-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug with multiple servers in group with mixed compress/uncompressed settings #1493
Comments
This issue is interesting. Based on my assumption that client is not using compression, that means that ProxySQL's code for handling compression is not relevant at all (so forget about #1410). Your error log indeeds give us some leads for further investigation. Is it possible to have the whole error log, or few hundreds lines?
At this point, I would investigate what causes an apparent lock. Action items I would suggest right now: Thanks |
Hi,
(EDITED because image links were not working) |
Hi. You are welcome! :) About compression on clients: About Finally, the links you posted does seem to work. |
Hi,
|
First easy parts / found problems - it could be better to split parts into separated issues but I didn't know what should gone separated so feel free to reuse it if it make sense:
On another instances with Debian Jessie I got same reject (1.4.6 package) and there is the package default of 20 set:
Also it's not possible to set the value down from 100 to 20 even:
Even after a ProxySQL restart the "saved to disk" value is again (20 or) 100 ...
In this case it's clear but if many threads fail in parallel there could be more than 1 backend server affected like here where no server is mentioned explicit in another log line:
Question is: Are these others all client <-> ProxySQL related only?
=> which server is gone (if the message came from ProxySQL)?
Since also with debug version logging is not easy to expand due to missing wiki documentation I have since now 2 more crashes (and running instance only for staging which didn't get enough traffic to reproduce the problem). From latest issues it could be similar to #1522 but I get no real crashes Here the log with time around the last "tried" production failure after 8:00 UTC:
As we can see the ProxySQL service "actively" disconnects from any mysql backend server - even the localhost one - which are then clientless and the client got only exceptions like:
My idea is that perhaps deactivated multiplexing could cause such problems (boosted by mixed compressed/uncompressed backend connections) ? How is debugging made ? ... the big question which is not explained in wiki / I cannot reproduce out of code:
is not working as expected and there are no log entries for these tries... |
New version, new chance ... but sadly no luck:
threads set to 20 in /etc/proxysql.cnf
threads checked: now only 20 instead 100 First hour without clients it was looking fine so far. But after touching all date to get them process around 18:30 UTC/20:30 MEST it tooks only 7 minutes to bring the new ProxySQL 1.4.9 down 😭
=> Could this be a problem because of deactivated multiplexing ? PHP Errors are all like this exception:
Interesting bug I found while writing - ProxySQL is shunning (with flapping between online/shunning) only the master which is fallback in the read group:
Ah, and compression is on again ... (=> next test without compression on all backends). Here the full start log with first lines after the incident - since I haven't still found an info howto increase debugging I couldn't offer more yet:
|
Sadly there is still no cause found. I tried many possible changes with every time testing the behavior 3x with 30-45 minutes runtime each ... I guess you can imagine how much fun I had last weeks because there is no documentation howto debug ProxySQL correctly. Because many tests were failing I took the packaged config (with threads=4) and added only the urgently needed config parts like server, users, rules, and maxsize/timout values from our config to the default one. In my tests (with our deactivated multiplexing because of the INDEX_OFFSET=... initialization) it was working last Thursday/Friday like this:
From this tests it looks like that the compression commonly breaks our pools. But today the production run failed again even with using the tested successfull settings and so staging which results that there is no "real" replicable "experimental" set-up possible. 😭 We can modify the setup /routine for this special case but must have in mind that connections running over ProxySQL could fail without recognizable cause in many other high throughput situations so that ProxySQL seems needed to be replaced somehow which I tried to avoid; also it's nice included in solutions like from Severalnines which could be a later enhancment. Here the failed staging run from today with ProxySQL stats page completely: I tried it also with free_connections_pct=0 instead default 10 but it was also failing. If interested I could also mail/share all other test images but still it seems to be non-reproducable within several days it makes no much sense I guess. |
I have run across an issue with mixed compressed/uncompressed clients, also, with an easy way to reproduce. if an uncompressed client's backend response gets cached, then the cached copy can be used correctly with either a future uncompressed or compressed client. however, if a compressed client's request to the backend gets cached, then when a future uncompressed client sends the same query, it seems compressed cached data gets sent to it, and it drops the connection quickly. |
Thank you @joelhock , perhaps you provided the missing information to get this worked on! |
one more piece of information: in addition to my prior comment about compressed cached results not being usable for uncompressed clients, it also seems if the client that triggers a new connection to a database backend is a compressed client, then that connection to the backend stays forever compressed and future requests from uncompressed clients that are fulfilled by that backend connection will cause the uncompressed client to disconnect. |
Reopening , this isn't fixed in 2.1.0 |
my test cases now work with 2.0.13. thank you!! |
Hi,
I re-setup a new PHP + DB slave server with Debian Stretch.
Before all was running fine with weighted settings with a local ProxySQL:
external master + external slave + local slave - all uncompressed connections.
Since the compress flag for another database as mentioned in another issue was working nicely I added the compress setting for saving traffic also to new proxysql configuration. But only for the external master + external slave. For local connections it should be not helpful and not performant to also compress the data.
First days all works nicely but from time to time when the backend server has to do many PHP tasks there were exeptions thrown like:
SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60238ms
without any reason - all servers are up, with not much connections and especially while these errors occured the connections went to zero on the localhost slave.and nothing into the logs which gave a good hint - only complaining about the master (with weight=1) if a server was mentioned:
It would be great to get some more hints in Wiki how the process is running and how debugging can be done (=> #969 ) because I was lost by searching for reasons/solutions till I found these both issues by the error message... Second one was the in changelog also mentioned
which remembered me to check my previous compress settings on the connections (which was off) I had them manually added some month ago in proxysql configuration before the ProxySQL reload function was available.
=> After setting ALL connections to compressed or to uncompressed the errors are gone.
=> After setting mixed compress/uncompress values to connections errors are back again.
==> this must be an "enhanced" problem of above bug #1410 ?
Best would be to fix it for upcoming 2.0.0 release... 😊
The text was updated successfully, but these errors were encountered: