Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent reads from db replicas using unit-of-work identifiers #2698

Closed
wants to merge 1 commit into from
Closed

Consistent reads from db replicas using unit-of-work identifiers #2698

wants to merge 1 commit into from

Conversation

saunderst
Copy link
Contributor

@saunderst saunderst commented Apr 24, 2020

This provides a way of ensuring that a series of database requests will be forwarded to one server as long as a user-defined unit-of-work identifier (or a series of such identifiers) remains the same across the series of requests. This is done by using the identifier(s), received via query comment fields, to generate a numeric hash which is ultimately used to index into the list of configured servers within a hostgroup.

The identifier names are defined by setting the unit_of_work_identifiers global variable to a comma-delimited string containing one or more identifier names, e.g. "request_id,job_id". Any of these names can then be used to declare unique identifiers to consistently route requests to a single server without any need for the application to be aware of the available servers.

Also, the delimiters between comment fields and within key-value pairs have been made configurable for all query comment fields using the global variables query_parser_token_delimiters and query_parser_key_value_delimiters. If not specified, these variables default to the standard ; and = delimiters.

@renecannao

@saunderst saunderst changed the title V2.0.11 shopify consistent reads Consistent reads from db replicas using unit-of-work identifiers Apr 24, 2020
@pondix
Copy link
Contributor

pondix commented Apr 24, 2020

Automated message: PR pending admin approval for build testing

@kirs
Copy link
Contributor

kirs commented May 21, 2020

@renecannao is this something you might take a look? We've been working on this internally on this internally at Shopify and would appreciate your feedback on this patch. Thanks!

@renecannao
Copy link
Contributor

add to whitelist

@renecannao
Copy link
Contributor

ok to test

@renecannao renecannao changed the base branch from v2.0.11 to v2.0.13 May 23, 2020 20:32
renecannao added a commit that referenced this pull request May 23, 2020
renecannao added a commit that referenced this pull request May 24, 2020
lib/MySQL_HostGroups_Manager.cpp Outdated Show resolved Hide resolved
@renecannao
Copy link
Contributor

Because the cleanup of get_random_MySrvC() doesn't only remove blocks of dead code but also changes some spacing, reviewing the correctness of it didn't seem trivial.
For example, I see variables related to TEST_AURORA completely removed.
I wrote a new PR for just cleaning up get_random_MySrvC() #2823 .

Can you please make a new PR only related to the new feature?

Something incorrect I noticed in this pull request is that it doesn't perform any change in MySQL_Thread::get_MyConn_local() .
Maybe this is an easy fix tho: if server_hash is not 0, calls to MySQL_Thread::get_MyConn_local() can be skipped.

renecannao added a commit that referenced this pull request May 24, 2020
renecannao added a commit that referenced this pull request May 24, 2020
This provides a way of ensuring that a series of database requests will
be forwarded to one server as long as a user-defined unit-of-work
identifier (or a series of such identifiers) remains the same across
the series of requests.  This is done by using the identifier(s),
received via query comment fields, to generate a numeric hash which is
ultimately used to index into the list of configured servers within a
hostgroup.

Also, the delimiters between comment fields and within key-value pairs
have been made configurable, with the standard ";" and "=" delimiters
established as defaults.
@saunderst
Copy link
Contributor Author

Because the cleanup of get_random_MySrvC() doesn't only remove blocks of dead code but also changes some spacing, reviewing the correctness of it didn't seem trivial.
For example, I see variables related to TEST_AURORA completely removed.
I wrote a new PR for just cleaning up get_random_MySrvC() #2823 .

Can you please make a new PR only related to the new feature?

Something incorrect I noticed in this pull request is that it doesn't perform any change in MySQL_Thread::get_MyConn_local() .
Maybe this is an easy fix tho: if server_hash is not 0, calls to MySQL_Thread::get_MyConn_local() can be skipped.

@renecannao
Thanks, René. I've rebased on your version, so now all the changes are related to the new feature.

It looks as though to change MySQL_Thread::get_MyConn_local() correctly, we'd have to instantiate the MySrvList class and reproduce the hash-based selection logic before checking against the cached connections. The easy fix you suggested was much easier! I'm curious, though: Why did my tests never fail? Is connection caching disabled by default?

@renecannao
Copy link
Contributor

Hi @saunderst , thank you for rebasing!
I will review.
To answer your questions:

It looks as though to change MySQL_Thread::get_MyConn_local() correctly, we'd have to instantiate the MySrvList class and reproduce the hash-based selection logic before checking against the cached connections.

To instantiate the MySrvList you need to access the global connection pool. Accessing the global connection pool (protected by a mutex) is exactly what per-thread connection caching tries to avoid, therefore accessing MySrvList will void the benefit of per-thread connection caching.

I'm curious, though: Why did my tests never fail? Is connection caching disabled by default?

Connections caching is enabled by default, and there is no way to disable it.
Connections stay in per-thread connection cache for a very short period of time: every MySQL_Thread works returns the connections in the cache back into the global connection pool as soon as it finishes processing sessions in every iteration of the main loop:
https://github.com/sysown/proxysql/blob/v2.0.13/lib/MySQL_Thread.cpp#L4021-L4023

In other words, the per-thread connections cache becomes useful only at very high concurrency and very fast requests: in this scenario it is likely that a backend connection freed is immediately used by another session, within the same iteration of the MySQL_Thread main loop.

You can have a look at these 2 variables:

SELECT * FROM stats_mysql_global WHERE variable_name IN ('ConnPool_get_conn_immediate','ConnPool_get_conn_success');

ConnPool_get_conn_immediate counts how many times a connection was retrieved from the per-thread connection cache.
Depending from your workload, it is possible that per-thread connection cache is never/rarely used.

@saunderst saunderst requested a review from renecannao June 8, 2020 18:35
@saunderst
Copy link
Contributor Author

Hi @renecannao. When do you think you might have a chance to look at this? I'm especially interested to know whether the simple approach to skipping connection caching still seems to make sense, since I have not implemented it in our fork yet.

@saunderst saunderst closed this by deleting the head repository Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants