Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxySQL not sending kill query to MySQL backend during long running query #1790

Closed
angrygreenfrogs opened this issue Nov 19, 2018 · 17 comments

Comments

@angrygreenfrogs
Copy link

Related to the new kill query feature as in these PRs:
#1779
#1781

  1. Create test data table, just for the purposes of trying to run a long running query that we can then kill before it finishes.

CREATE TABLE slow (
a blob
);

  1. Inserted about 4K rows of random data into the a column

  2. First, a test to show this working to MySQL directly:

a. Using a direct connection to MySQL, run a slow test query from MySQL Workbench
select count(*) from slow as t1
join slow as t2 on t1.a = t2.a
join slow as t3 on t1.a = t3.a
join slow as t4 on t1.a = t4.a

b. Hit the stop button

c. Workbench error result is:
Error Code: 1317. Query execution was interrupted 2.031 sec

d. MySQL query log contains the query and the kill query call:

2018-11-16T14:47:05.632746Z 34769 Query select count(*) from slow as t1
join slow as t2 on t1.a = t2.a
join slow as t3 on t1.a = t3.a
join slow as t4 on t1.a = t4.a
2018-11-16T14:47:06.841817Z 34684 Query KILL QUERY 34769

  1. Now repeat the same steps using a connection through ProxySQL

a. The query appears in the MySQL log:

2018-11-16T14:48:21.576535Z 7 Query select count(*) from slow as t1
join slow as t2 on t1.a = t2.a
join slow as t3 on t1.a = t3.a
join slow as t4 on t1.a = t4.a

b. However, the KILL QUERY request isn't sent and doesn't appear in the log.

c. Then eventually after 30 seconds it looks like something gives up and the connection is then killed/re-connected.

2018-11-16T16:18:01.028110Z 36082 Query KILL CONNECTION 35879
2018-11-16T16:18:01.028486Z 36082 Quit
2018-11-16T16:18:01.044543Z 36083 Connect root@localhost on information_schema using TCP/IP
2018-11-16T16:18:01.044720Z 36083 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2018-11-16T16:18:01.045056Z 36083 Query SHOW SESSION VARIABLES LIKE 'lower_case_table_names'
2018-11-16T16:18:01.047571Z 36083 Query SELECT current_user()
2018-11-16T16:18:01.048073Z 36083 Query SET CHARACTER SET utf8
2018-11-16T16:18:01.049734Z 36083 Query SET SQL_SAFE_UPDATES=1
2018-11-16T16:18:01.050229Z 36083 Query SELECT CONNECTION_ID()
2018-11-16T16:18:01.051610Z 36083 Query SHOW SESSION STATUS LIKE 'Ssl_cipher'

d. Workbench error result is:
Error Code: 2013. Lost connection to MySQL server during query 30.000 sec

I've also re-run these same experiments and attached the tcpdump outputs in case they may help.

After looking through the tcdump analysis in the failing case (#4 above)

capture.zip

, what sticks out to me is that I can see a KILL QUERY being sent to proxysql from workbench (4th request in proxysql.cap), but that kill query is not being forwarded by ProxySQL back to MySQL for some reason (nothing in mysql log). Then after 30s KILL CONNECTION appears to be generated by ProxySQL, because the tcpdump from Workbench doesn't show this call, but looking at the logs, this occurs at about the same time +30s in the tcpdump log where I believe Workbench is basically just terminating the TCP connection, so it's probably just a normal cleanup by ProxySQL.

Conversely, with workbench talking directly to MySQL, the 30th request in mysql.cap is a KILL QUERY which is received immediately by MySQL and seems to work correctly.

@anantsaraswat
Copy link

Any update on this ?

@pondix
Copy link
Contributor

pondix commented Mar 9, 2020

The connection is lost after 30 seconds (client side) so ProxySQL will KILL CONNECTION in order to cleanup. This definitely looks like a timeout on Workbench.

Please check your client side settings in Workbench under Edit → Preferences → SQL Editor → DBMS connection read time out (in seconds).

Also, when you connect with Workbench to the MySQL traffic port, please check your timeout settings via the MySQL command:

show variables like '%timeout%';

Also check these in the ProxySQL Admin port with the same command.

@anantsaraswat
Copy link

@pondix I guess it's not the issue with the long query running really, it's more about we can't kill the running query from any MySQL client using ProxySQL.

So let's say I connect to my DB server using ProxySQL and run 'select sleep(100)' and try to kill it from running queries list or using kill $PROCESS_ID, didn't work.

@renecannao
Copy link
Contributor

@anantsaraswat : is there an issue at all?
This issue should have been closed long time ago, as this was fixed in 2.0.0 and 1.4.13

@anantsaraswat
Copy link

@renecannao Yes, the issue is still there, we are unable to kill query using ProxySQL

@renecannao
Copy link
Contributor

Please provide more information, including a reproducible test case.
Please note that right now I don't even know which version you are running.
Thanks

@anantsaraswat
Copy link

anantsaraswat commented Mar 11, 2020

Thanks @renecannao . Please check the following.

ProxySQL version 2.0.5-37-gc8e32ee, codename Truls

Mysql Server added in ProxySQL:
+--------------+----------------+-----------------+--------+
| hostgroup_id | hostname | max_connections | status |
+--------------+----------------+-----------------+--------+
| 1 | 10.169.XXX.XX | 1000 | ONLINE |
| 2 | 81.168.XXX.XX | 1000 | ONLINE |
| 3 | 81.187.XXX.XX | 1000 | ONLINE |

Mysql_query_rules:

MySQL [(none)]> select username,client_addr,proxy_addr,proxy_port,negate_match_pattern,destination_hostgroup,log,apply from mysql_query_rules where username='anant' and proxy_port=3314;
+----------+----------------+------------+------------+----------------------+-----------------------+-----+-------+
| username | client_addr | proxy_addr | proxy_port | negate_match_pattern | destination_hostgroup | log | apply |
+----------+----------------+------------+------------+----------------------+-----------------------+-----+-------+
| anant | 192.168.100.71 | NULL | 3314 | 0 | 3 | 1 | 1 |
+----------+----------------+------------+------------+----------------------+-----------------------+-----+-------+
1 row in set (0.00 sec)

Step 1 - Open 2 instance of Mysql workbench or SQLyog and connect to ProxySQL on port 3314

Step 2 - On first instance run 'select sleep(100)'

Step 3 - Open process list / Client connection list on the second instance and try to kill the 'select sleep(100)' query running from 1st instance, it may go from process list but on refresh, it appears. Basically, the second instance is unable to kill that query.

@renecannao
Copy link
Contributor

This seems the opposite issue of this issue :)
You are running a KILL getting the id from MySQL's processlist, not from the original client itself.
You should try to kill the connection/query from the same instance of Workbench/SQLyog

@anantsaraswat
Copy link

@renecannao I faced this issue during running a query, MySQL client hanged and I wanted to kill that query So that it will not impact the backend server, Can you please suggest what should I do in this scenario. Thanks again.

@angrygreenfrogs
Copy link
Author

I'll add a bit to this issue as I reported it initially and worked with Rene on a related fix in #860

This issue was reported after the fix in #860 was implemented in 2.0.0 and 1.4.13.

@pondix's comment above "This definitely looks like a timeout on Workbench." isn't quite what I had a problem with ...it's not the timeout that's the problem, that's fine, it's that I can't cancel the query before the timeout occurs.

However, I have to say that I can't reproduce this problem any more as I had originally reported it, so I can't see that it's actually an issue anymore.

I tried again from a scratch setup:

  1. ProxySQL v2.0.10 freshly installed from https://github.com/sysown/proxysql/releases/download/v2.0.10/proxysql_2.0.10-ubuntu14_amd64.deb

  2. Created a simple test DB, test user and tested through proxysql and through a direct connection to mysql

  3. Testing by running "select sleep(100)" in Workbench and then cancelling the query works fine for me, so that part of what @anantsaraswat reported, I don't have a problem with.. From the proxysql log I can see the query being cancelled:

2020-03-12 03:42:28 [INFO] Killing query 17
2020-03-12 03:42:28 MySQL_Session.cpp:128:kill_query_thread(): [WARNING] KILL QUERY 66 on 127.0.0.1:3306

  1. My original issue was slightly different where I ran an intentionally slow query with a ton of JOINS on a junk test table and I was unable to cancel it through proxysql, but was able to do so when connecting directly.

I tried this same thing again, and it does work now. I'm slightly unsure what changed, because as I said, I did report this after the #860 fix, but I also can't reproduce it again now.

@renecannao
Copy link
Contributor

@angrygreenfrogs : you created this issue before #1781 was applied to 1.4 , so I think this is why you opened this issue?

Testing by running "select sleep(100)" in Workbench and then cancelling the query works fine for me, so that part of what @anantsaraswat reported, I don't have a problem with..

This is what I am suggesting to @anantsaraswat too: cancel the query directly from the client (Workbench).

@anantsaraswat : you can't kill a query using the ID you read when running show processlist against mysqld passing through proxysql. This is by design.
Think to a setup where there are multiple servers in the same hostgroup (a common setup for proxysql use): it is very likely that show processlist will run on a different server than where is running the query you want to kill.

For this reason, you must cancel the query from the client directly.

@angrygreenfrogs
Copy link
Author

angrygreenfrogs commented Mar 12, 2020

@renecannao Ah, no, that was a separate issue... we ended up having to merge the #860 change to 1.4 because we had that whacky periodic authentication error in 2.0 that you might recall so we wanted to use 1.4 at the time since it was stable for us and 2.0 I think was still almost pre-release back then... I can't even remember where the issue is for it, but it was unrelated and I recall you having fixed it since.

I think you've identified the core cause of @anantsaraswat's trouble... and if he still has a problem, likely should be opened separately from this issue.

This issue as originally stated can't be reproduced, so I'd gladly call it closed.

Edit: Oh, minor point, I found the auth issue we had at the time, it was #1757, but that's not really important

@anantsaraswat
Copy link

Thanks @renecannao for explaining the scenario, it makes sense to me now, But I would open another issue, which should allow admin to kill the stuck query from proxysql admin at least. As this looks like a blackhole spot to me in the ProxySQL.

Thanks @angrygreenfrogs Please keep this task closed.

@anantsaraswat
Copy link

@svaroqui
Copy link

svaroqui commented Mar 27, 2020

Hi Rene, we also hit that issue with a single backend , proxysql writer only setup, and we conclude that's an issue to ask the proxysql admin to kill a query, is there any feature request to rewrite show processlist by select * from stats_mysql_processlist where user= at the protocol layer and kill inplace on the proxysql side ;

@renecannao
Copy link
Contributor

@svaroqui , please open a new issue flagging it as Feature Request.
This may need significant changes, but it seems an interesting feature

@svaroqui
Copy link

Thanks # #2621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants