-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for AWS RDS MySQL Multi-AZ Cluster auto-discovery #4406
Conversation
Automated message: PR pending admin approval for build testing |
add to whitelist |
…instead of originating server's values, add new field in mmsd for reader hostgroup, and query monitor db instead of admin db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several points that requires addressing / discussing before merging.
lib/MySQL_Monitor.cpp
Outdated
|
||
// Add the new servers if any | ||
if (!new_servers.empty()) { | ||
int successfully_added_all_servers = MyHGM->add_discovered_servers_to_mysql_servers_and_replication_hostgroups(new_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we could simplify this error handling, there is no relevant exceptions that can be thrown in the inner code. So, better handling and logging any potential errors inside, and remove the exception safeguards. Only standard containers are used, and in the event of an exception there, it's better to threat that exception as an assert
as it would most likely be a programmatic error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the outer exception handling
lib/MySQL_HostGroups_Manager.cpp
Outdated
uint16_t port = std::get<1>(s); | ||
long int hostgroup_id = std::get<2>(s); | ||
|
||
// Add the discovered server with default values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a newer abstraction that can be used for server placement MySQL_HostGroups_Manager::create_new_server_in_hg
. Is there any motivation not to use it? If not, would be preferred using it for this case. If there is a limitation in the interface it may be simpler to fix it that duplicating code. For reference, here it's an usage of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will make use of the new abstraction. For the third field of srv_info_t
struct, the kind
field, what should I use? I will default to AWS RDS
string, but let me know if it should be something else.
lib/MySQL_HostGroups_Manager.cpp
Outdated
purge_mysql_servers_table(); | ||
mydb->execute("DELETE FROM mysql_servers"); | ||
proxy_debug(PROXY_DEBUG_MYSQL_CONNPOOL, 4, "DELETE FROM mysql_servers\n"); | ||
generate_mysql_servers_table(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block, starting at previous if
is missing the update on the global checksums
after this line. This section can be taken as an example update_group_replication_add_autodiscovered. Here we can see that the following part is missing:
// Update the global checksums after 'mysql_servers' regeneration
{
unique_ptr<SQLite3_result> resultset { get_admin_runtime_mysql_servers(mydb) };
string mysrvs_checksum { get_checksum_from_hash(resultset ? resultset->raw_checksum() : 0) };
save_runtime_mysql_servers(resultset.release());
proxy_info("Checksum for table %s is %s\n", "mysql_servers", mysrvs_checksum.c_str());
pthread_mutex_lock(&GloVars.checksum_mutex);
update_glovars_mysql_servers_checksum(mysrvs_checksum);
pthread_mutex_unlock(&GloVars.checksum_mutex);
}
Checksum re-computation is mandatory in this case after re-generating mysql_servers_table
. But in this case there are a couple of extra operations that are required:
// Update the global checksums after 'mysql_servers' regeneration
{
unique_ptr<SQLite3_result> resultset { get_admin_runtime_mysql_servers(mydb) };
string mysrvs_checksum { get_checksum_from_hash(resultset ? resultset->raw_checksum() : 0) };
save_runtime_mysql_servers(resultset.release());
// PROPOSAL: Required to update the runtime_mysql_servers checksum with the new checksum, in a
// similar way it's done during `commit` via `commit_update_checksum_from_mysql_servers`
uint64_t raw_checksum = this->runtime_mysql_servers ? this->runtime_mysql_servers->raw_checksum() : 0;
table_resultset_checksum[HGM_TABLES::MYSQL_SERVERS] = raw_checksum;
// PROPOSAL: Required to update with the new 'mysql_group_replication_hostgroups'; this is
// only required for preserving coherence in the checksums, otherwise they would be
// inconsistent with `commit` generated checksums. This should be reworked into a function.
{
SpookyHash rep_hgs_hash {};
bool init = false;
uint64_t servers_v2_hash = table_resultset_checksum[HGM_TABLES::MYSQL_SERVERS_V2];
if (servers_v2_hash) {
if (init == false) {
init = true;
rep_hgs_hash.Init(19, 3);
}
rep_hgs_hash.Update(&servers_v2_hash, sizeof(servers_v2_hash));
}
CUCFT1(
rep_hgs_hash, init, "mysql_replication_hostgroups", "writer_hostgroup",
table_resultset_checksum[HGM_TABLES::MYSQL_REPLICATION_HOSTGROUPS]
);
}
proxy_info("Checksum for table %s is %s\n", "mysql_servers", mysrvs_checksum.c_str());
pthread_mutex_lock(&GloVars.checksum_mutex);
update_glovars_mysql_servers_checksum(mysrvs_checksum);
pthread_mutex_unlock(&GloVars.checksum_mutex);
}
The comments with PROPOSAL
mark the proposed changes to the previously mentioned required block. All this code is a proposal, and has not being tested.
lib/MySQL_HostGroups_Manager.cpp
Outdated
proxy_debug(PROXY_DEBUG_MYSQL_CONNPOOL, 4, "DELETE FROM mysql_servers\n"); | ||
generate_mysql_servers_table(); | ||
update_table_mysql_servers_for_monitor(false); | ||
rebuild_hostname_hostgroup_mapping(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the previous suggested changes, this function should be replaced by update_hostgroup_manager_mappings
. The two functions are identical, only differing in the missing check on hgsm_mysql_servers_checksum
in the new rebuild_hostname_hostgroup_mapping
function. Updating this checksums is addressed in the previous comment, as it's better for keeping general consistency. If I have not missed anything, I suggest deleting this new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting the new rebuild_hostname_hostgroup_mapping
and using update_hostgroup_manager_mappings
instead
lib/MySQL_HostGroups_Manager.cpp
Outdated
add(mysrvc, hostgroup_id); | ||
|
||
proxy_info( | ||
"Adding new discovered server %s:%d with: hostgroup=%d, weight=%ld, max_connections=%ld, use_ssl=%d\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following hostgroup=%d
should be replaced with hostgroup=%ld
as hostgroup_id
is a long int
, or remove the unnecessary casting into long int
from the original int
that is received inside the tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this logging statement here as create_new_server_in_hg
already has the same logging statement
include/MySQL_HostGroups_Manager.h
Outdated
@@ -1092,6 +1092,9 @@ class MySQL_HostGroups_Manager { | |||
void set_server_current_latency_us(char *hostname, int port, unsigned int _current_latency_us); | |||
unsigned long long Get_Memory_Stats(); | |||
|
|||
int add_discovered_servers_to_mysql_servers_and_replication_hostgroups(vector<tuple<string, int, int>> new_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to pass the vector by value, also, because it's never modified, could be easily converted into const vector<tuple<string, int, int>>&
.
include/MySQL_Monitor.hpp
Outdated
@@ -442,6 +447,7 @@ class MySQL_Monitor { | |||
static bool update_dns_cache_from_mysql_conn(const MYSQL* mysql); | |||
static void trigger_dns_cache_update(); | |||
|
|||
void process_discovered_topology(const std::string& originating_server_hostname, vector<MYSQL_ROW> discovered_servers, int reader_hostgroup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if MYSQL_ROW
are just pointers, it's a good practice to make the second argument const vector<MYSQL_ROW>&
.
lib/proxysql_utils.cpp
Outdated
long val = strtol(s, &temp, 0); | ||
|
||
if (temp == s || *temp != '\0' || ((val == LONG_MIN || val == LONG_MAX) && errno == ERANGE)) { | ||
throw std::runtime_error("Could not parse long."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to throw an exception, error should be handled immediately, since this function is only used when parsing data received by the backend server being monitored. See the comment where this function is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this method entirely and changed the caller to throw a proxy_error
message and return early if an exception is detected during string conversion to int
lib/MySQL_Monitor.cpp
Outdated
string current_discovered_port = s[3]; | ||
|
||
if (find(saved_hostnames.begin(), saved_hostnames.end(), current_discovered_hostname) == saved_hostnames.end()) { | ||
tuple<string, int, int> new_server(current_discovered_hostname, parseLong(current_discovered_port.c_str()), reader_hostgroup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of parseLong
here together with the exception model doesn't improve the handling of a malformed response. There are other places in which the response from the backend is assumed not to be malformed, for example, in the number of rows returned above. If handling a invalid value for that response is desired, is much better to handle the case here, parsing the value into a local, and in case of failure, place a proxy_error
with the details of the invalid row received. For safety, would be best to either clear the new_servers
or return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the exception handling here as we can rely on the other places where backend is assumed not to be malformed.
lib/MySQL_Monitor.cpp
Outdated
|
||
// Process the discovered servers and add them to 'runtime_mysql_servers' | ||
if (!discovered_servers.empty()) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the above comments about exceptions generated inside the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the outer exception handling
…putation after adding a new server, and other small changes based on feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @anphucbui,
thanks for the PR and for addressing the previous feedback. There are a few extra changes required, but since they are minor in term of code changes, I will be merging the PR into a dedicated branch and addressing them. I'm leaving the feedback here only for documenting purposes.
Thanks, Javier.
* @param new_servers A vector of tuples where each tuple contains the values needed to add each new server. | ||
*/ | ||
void MySQL_HostGroups_Manager::add_discovered_servers_to_mysql_servers_and_replication_hostgroups(vector<tuple<string, int, int>>& new_servers) { | ||
int added_new_server; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uninitialized variable, fine right now if the function isn't called with an empty list, could misbehave otherwise, better to initialize with -1
.
@@ -1092,6 +1092,8 @@ class MySQL_HostGroups_Manager { | |||
void set_server_current_latency_us(char *hostname, int port, unsigned int _current_latency_us); | |||
unsigned long long Get_Memory_Stats(); | |||
|
|||
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(vector<tuple<string, int, int>>& new_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous comment not addressed, should be const
.
*/ | ||
void MySQL_HostGroups_Manager::add_discovered_servers_to_mysql_servers_and_replication_hostgroups(vector<tuple<string, int, int>>& new_servers) { | ||
int added_new_server; | ||
wrlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing lock on Admin GloAdmin->mysql_servers_wrlock()
, since get_admin_runtime_mysql_servers
is used, Admin should be locked, in other places were both locks needs to be taken, we take them together for simplicity. See MySQL_HostGroups_Manager::update_aws_aurora_set_reader
.
try { | ||
current_discovered_port_int = stoi(s[3]); | ||
} catch (...) { | ||
proxy_error("Unable to parse the port value during topology discovery: [%s]. Terminating discovery early.", current_discovered_port_string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline character \n
in error message. Also, C string not being passed (char*
) instead, a std::string
is feed, this is undefined behavior
at best, potential crash at worst.
Adding support for AWS RDS MySQL Multi-AZ Cluster auto-discovery