Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galera read-write split - SET instructions should execute in both backend connections #653

Closed
fmbiete opened this issue Aug 23, 2016 · 9 comments
Assignees

Comments

@fmbiete
Copy link

fmbiete commented Aug 23, 2016

Maybe this is not a bug, but I don't see how it can be achieved, a documentation request?

For valid read-write split "SET" session instructions should be executed in the read and write backend connections for the same session.
So the autocommit, charset, time_zone, or custom variables have the value expected.

In my configuration all non-SELECT instructions are being executed in hostgroup_id 0, and SELECT instructions in hostgroup_id 1. hostgroup_id 0 won't accept reads.

MySQL [test]> set autocommit = 0;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select @@autocommit;
+--------------+
| @@autocommit |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)

MySQL [test]> show variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+
1 row in set (0.00 sec)
MySQL [test]> create table t2 (c1 integer, primary key (c1));
Query OK, 0 rows affected (0.04 sec)

MySQL [test]> set @variable1 = 67;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> insert into t2 values (@variable1);
Query OK, 1 row affected (0.00 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 67 |
+----+
1 row in set (0.00 sec)

MySQL [test]> select @variable1;
+------------+
| @variable1 |
+------------+
| NULL       |
+------------+
1 row in set (0.00 sec)

Configuration

MySQL [(none)]> select * from mysql_query_rules \G
*************************** 1. row ***************************
              rule_id: 10
               active: 1
             username: test
           schemaname: NULL
               flagIN: 0
          client_addr: NULL
           proxy_addr: NULL
           proxy_port: NULL
               digest: NULL
         match_digest: ^SELECT .* FOR UPDATE$
        match_pattern: NULL
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 0
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
              retries: NULL
                delay: NULL
       mirror_flagOUT: NULL
     mirror_hostgroup: NULL
            error_msg: NULL
                  log: NULL
                apply: 1
*************************** 2. row ***************************
              rule_id: 20
               active: 1
             username: test
           schemaname: NULL
               flagIN: 0
          client_addr: NULL
           proxy_addr: NULL
           proxy_port: NULL
               digest: NULL
         match_digest: ^SELECT
        match_pattern: NULL
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 1
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
              retries: NULL
                delay: NULL
       mirror_flagOUT: NULL
     mirror_hostgroup: NULL
            error_msg: NULL
                  log: NULL
                apply: 1
2 rows in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers \G
*************************** 1. row ***************************
       hostgroup_id: 0
           hostname: db01
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
            use_ssl: 0
     max_latency_ms: 0
*************************** 2. row ***************************
       hostgroup_id: 0
           hostname: db02
               port: 3306
             status: OFFLINE_SOFT
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
            use_ssl: 0
     max_latency_ms: 0
*************************** 3. row ***************************
       hostgroup_id: 0
           hostname: db03
               port: 3306
             status: OFFLINE_SOFT
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
            use_ssl: 0
     max_latency_ms: 0
*************************** 4. row ***************************
       hostgroup_id: 1
           hostname: db01
               port: 3306
             status: OFFLINE_SOFT
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
            use_ssl: 0
     max_latency_ms: 0
*************************** 5. row ***************************
       hostgroup_id: 1
           hostname: db02
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
            use_ssl: 0
     max_latency_ms: 0
*************************** 6. row ***************************
       hostgroup_id: 1
           hostname: db03
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
            use_ssl: 0
     max_latency_ms: 0
6 rows in set (0.00 sec)
@renecannao
Copy link
Contributor

I assume this is not documented properly, so surely it is documentation request.
Let's start from the simplest point to reply, and it will be useful to understand certain behavior.

ProxySQL doesn't track user variable .
Because of this, when it recognize a session variable, it disables multiplexing for that connection.
In other words, as soon as you run set @variable1 = 67 , that specific query is executed to the writer (according to your rules) and that connection cannot be used anymore for multiplexing.
This is perhaps an easy to solve issue, because you can create a rule specifying that every query with @ should go to the writer.

Why ProxySQL can't send the SET command everywhere? Well, you can have 1000 hostgroups configured, and maybe a single client connection will communicate to only 1 or 2 HGs. ProxySQL cannot send the SET commands to all of them not knowing if you are going to use them.
Furthermore, ProxySQL should handle cases in which a backend dies and a connection somewhere else need to be established: this is another reason why it can't send the SET command everywhere synchronously.
It is a bit complex, but ProxySQL could track all user variables and store them internally, so if you run set @variable1 = 67 now, ProxySQL will execute the same when it will connect to a new HG.
Although, if you run a query like SELECT col1 INTO @a FROM tablename LIMIT 1 , ProxySQL will never be able to know what there is in @a .
To not make things too complicated, the principle it follows it is simple: when it sees a @ , it disable multiplexing for that connection.

ProxySQL tracks some session variables
There is still a lot that needs to be improved here, but from the list above only autocommit and charset is tracked, in the common form used by most application.
So, for example, set autocommit=1 works well.
When you issue set autocommit , ProxySQL doesn't always forward it to a backend. In fact, if your application issues set autocommit as the first statement, ProxySQL doesn't know where to send it (again, think to a setup with thousands of HGs).
For this reason, when an application issue set autocommit ProxySQL will reply with OK, meaning that it acknowledged that the applications wants autocommit set.
When the application sends real traffic and proxysql will determine where to send that traffic, it will check the status of autocommit in the backend connection: if it doesn't match what the application wants, it will change it in the backend connection.

So, in your case, when you run set autocommit=0, proxysql will reply OK without sending it anywhere.
Then you issue a SELECT statement, and the results says that it is ON.
But I just said that ProxySQL will match it...
In reality, ProxySQL will not always match autocommit. This depends from the variable mysql-enforce_autocommit_on_reads (that is false by default).
The reason is that enforcing autocommit on reads while there is a read/write split, will cause transactions to be opened on a reader, and this is particularly bad if this is a slave.

These behaviors were tailored over a the needs of a real application in python (that uses set autocommit extensively) and using 30 shards with read/write split.

If your application doesn't seem to work with this behavior please let me know, and I will see if it is something that can be tuned.

Question: does your application uses different timezone in different connections?
Thanks

@fmbiete
Copy link
Author

fmbiete commented Aug 24, 2016

Thanks for the detailed explanation, Rene.

Indeed, it makes sense and using a third rule to catch any variable in SELECT sentences will solve any problem I could have (hopefully).

No, the application doesn't use different timezone. But the reporting applications do to present the dates in the client timezone, even though in the same connection. I will take a look to see if we can workaround that.

@renecannao
Copy link
Contributor

You are welcome.
When the feature in #594 will be added there will be more control on multiplexing. For example, multiplexing can be disabled when a timezone is set (until the time ProxySQL will start tracking timezone).

As an alternative, you can add a rewrite rule in ProxySQL to rewrite ^SET time_zone= into SET @@timezone:= . In this way multiplexing should be disabled.

@fmbiete
Copy link
Author

fmbiete commented Aug 24, 2016

Is there any way to link a complete transaction to the same backend server?

Original MariaDB server

[root@bb71fdc83f85 /]# mysql -u test -p -h db01 test
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 784
Server version: 10.1.16-MariaDB MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 67 |
+----+
2 rows in set (0.00 sec)

MariaDB [test]> insert into t2 values (30);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 30 |
| 67 |
+----+
3 rows in set (0.00 sec)

MariaDB [test]> rollback;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 67 |
+----+
2 rows in set (0.00 sec)

ProxySQL
By my rules SELECT goes to the readers and the rest to the writer. Any way to differentiate there that we are inside a transaction so that SELECT goes to the writer?

[root@bb71fdc83f85 /]# mysql -u test -p --protocol=tcp test
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.5.30 (ProxySQL)

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 67 |
+----+
2 rows in set (0.00 sec)

MySQL [test]> insert into t2 values (30);
Query OK, 1 row affected (0.00 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |  
| 67 |  <- We are missing the value inserted but not yet commited. This select has to go to the writer where the transaction is running
+----+
2 rows in set (0.00 sec)

MySQL [test]> rollback;
Query OK, 0 rows affected (0.01 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 67 | <- OK, select will go to any reader, we are outside the transaction scope
+----+
2 rows in set (0.00 sec)

MaxScale using r-w split

[root@3a16eaeb6ed8 /]# mysql -u test -p --protocol=tcp test
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 10.0.0 1.4.3-maxscale MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 67 |
+----+
2 rows in set (0.00 sec)

MySQL [test]> insert into t2 values (30);
Query OK, 1 row affected (0.00 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 30 |
| 67 | <- The "start transaction" has disabled the r-w split for this transaction
+----+
3 rows in set (0.00 sec)

MySQL [test]> rollback;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select * from t2;
+----+
| c1 |
+----+
| 20 |
| 67 | <- this select will go to the readers
+----+
2 rows in set (0.01 sec)

@renecannao
Copy link
Contributor

Francisco,

Yes, there is a way to link a complete transaction to the same backend server: this is done using transaction_persistent in mysql_users table.

Although, the default is 0 (disabled), and this is intentional. More details below.

MaxScale disables read/write split for a transaction, while ProxySQL doesn't (by default).
The reason why ProxySQL doesn't have this default is that it is meant to be extremely more granular.
To make an example, it is possible for ProxySQL to read data within "a transaction" (not really a transaction, but it is transparent to the client) from a server that is not even part of the cluster.
For example, you can have a master where you write data, a slave where you read most of the data (or another node in case of Galera), a third node not part of this cluster where you have static data (that doesn't have to be part of OLTP workload) and a forth node where you update counters.
All within a "transaction", without the client knowing it was interacting with 4 different nodes.

While MaxScale does r/w split outside a transaction, ProxySQL does query routing no matter if there is a transaction or not. Note that r/w split is a subset of query routing.
This means that you have to instruct ProxySQL with rules on what to do with traffic, and where to send it.
The common configuration procedure for a single cluster is the following:

  • configure ProxySQL to send all the traffic to only one node;
  • check in stats_mysql_query_digest which are the most expensive SELECT statements;
  • determine which expensive statements should be moved to reader nodes;
  • configure mysql_query_rules to send only expensive SELECT to the readers.
    The idea is therefore quite simple: send to slaves/readers only what you want to send to slaves, not just any SELECT.
    This allows to have r/w split even when there is a transaction, therefore a lot of flexibility.
    I hope this description helps in how to configure ProxySQL with regards to r/w split.

@fmbiete
Copy link
Author

fmbiete commented Aug 25, 2016

Thanks again, transaction_persistent seems to do what I was expecting in the first place. I will take a deep look into all the possibilities than ProxySQL offers.

@renecannao
Copy link
Contributor

SET time_zone is now tracked, see #816

@fmbiete
Copy link
Author

fmbiete commented Dec 8, 2016

Many thanks for the head ups, I will try it.

@renecannao
Copy link
Contributor

Closing.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants