Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
WL#15280: HEATWAVE SUPPORT FOR MDS HA
# This is the 1st commit message: WL#15280: HEATWAVE SUPPORT FOR MDS HA Problem Statement ----------------- Currently customers cannot enable heatwave analytics service to their HA DBSystem or enable HA if they are using Heatwave enabled DBSystem. In this change, we attempt to remove this limitation and provide failover support of heatwave in an HA enabled DBSystem. High Level Overview ------------------- To support heatwave with HA, we extended the existing feature of auto- reloading of tables to heatwave on MySQL server restart (WL-14396). To provide seamless failover functionality to tables loaded to heatwave, each node in the HA cluster (group replication) must have the latest view of tables which are currently loaded to heatwave cluster attached to the primary, i.e., the secondary_load flag should be in-sync always. To achieve this, we made following changes - 1. replicate secondary load/unload DDL statements to all the active secondary nodes by writing the DDL into the binlog, and 2. Control how secondary load/unload is executed when heatwave cluster is not attached to node executing the command Implementation Details ---------------------- Current implementation depends on two key assumptions - 1. All MDS DBSystems will have RAPID plugin installed. 2. No non-MDS system will have the RAPID plugin installed. Based on these assumptions, we made certain changes w.r.t. how server handles execution of secondary load/unload statements. 1. If secondary load/unload command is executed from a mysql client session on a system without RAPID plugin installed (i.e., non-MDS), instead of an error, a warning message will be shown to the user, and the DDL is allowed to commit. 2. If secondary load/unload command is executed from a replication connection on an MDS system without heatwave cluster attached, instead of throwing an error, the DDL is allowed to commit. 3. If no error is thrown from secondary engine, then the DDL will update the secondary_load metadata and write a binlog entry. Writing to binlog implies that all the consumer of binlog now need to handle this DDL gracefully. This has an adverse effect on Point-in-time Recovery. If the PITR backup is taken from a DBSystem with heatwave, it may contain traces of secondary load/unload statements in its binlog. If such a backup is used to restore a new DBSystem, it will cause failure while trying to execute statements from its binlog because a) DBSystem will not heatwave cluster attached at this time, and b) Statements from binlog are executed from standard mysql client connection, thus making them indistinguishable from user executed command. Customers will be prevented (by control plane) from using PITR functionality on a heatwave enabled DBSystem until there is a solution for this. Testing ------- This commit changes the behavior of secondary load/unload statements, so it - adjusts existing tests' expectations, and - adds a new test validating new DDL behavior under different scenarios Change-Id: Ief7e9b3d4878748b832c366da02892917dc47d83 # This is the commit message #2: WL#15280: HEATWAVE SUPPORT FOR MDS HA (PITR SUPPORT) Problem ------- A PITR backup taken from a heatwave enabled system could have traces of secondary load or unload statements in binlog. When such a backup is used to restore another system, it can cause failure because of following two reasons: 1. Currently, even if the target system is heatwave enabled, heatwave cluster is attached only after PITR restore phase completes. 2. When entries from binlogs are applied, a standard mysql client connection is used. This makes it indistinguishable from other user session. Since secondary load (or unload) statements are meant to throw error when they are executed by user in the absence of a healthy heatwave cluster, PITR restore workflow will fail if binlogs from the backup have any secondary load (or unload) statements in them. Solution -------- To avoid PITR failure, we are introducing a new system variable rapid_enable_delayed_secondary_ops. It controls how load or unload commands are to be processed by rapid plugin. - When turned ON, the plugin silently skips the secondary engine operation (load/unload) and returns success to the caller. This allows secondary load (or unload) statements to be executed by the server in the absence of any heatwave cluster. - When turned OFF, it follows the existing behavior. - The default value is OFF. - The value can only be changed when rapid_bootstrap is IDLE or OFF. - This variable cannot be persisted. In PITR workflow, Control Plane would set the variable at the start of PITR restore and then reset it at the end of workflow. This allows the workflow to complete without failure even when heatwave cluster is not attached. Since metadata is always updated when secondary load/unload DDLs are executed, when heatwave cluster is attached at a later point in time, the respective tables get reloaded to heatwave automatically. Change-Id: I42e984910da23a0e416edb09d3949989159ef707 # This is the commit message #3: WL#15280: HEATWAVE SUPPORT FOR MDS HA (TEST CHANGES) This commit adds new functional tests for the MDS HA + HW integration. Change-Id: Ic818331a4ca04b16998155efd77ac95da08deaa1 # This is the commit message #4: WL#15280: HEATWAVE SUPPORT FOR MDS HA BUG#34776485: RESTRICT DEFAULT VALUE FOR rapid_enable_delayed_secondary_ops This commit does two things: 1. Add a basic test for newly introduced system variable rapid_enable_delayed_secondary_ops, which controls the behavior of alter table secondary load/unload ddl statements when rapid cluster is not available. 2. It also restricts the DEFAULT value setting for the system variable So, following is not allowed: SET GLOBAL rapid_enable_delayed_secondary_ops = default This variable is to be used in restricted scenarios and control plane only sets it to ON/OFF before and after PITR apply. Allowing set to default has no practical use. Change-Id: I85c84dfaa0f868dbfc7b1a88792a89ffd2e81da2 # This is the commit message #5: Bug#34726490: ADD DIAGNOSTICS FOR SECONDARY LOAD / UNLOAD DDL Problem: -------- If secondary load or unload DDL gets rolled back due to some error after it had loaded / unloaded the table in heatwave cluster, there is no undo of the secondary engine action. Only secondary_load flag update is reverted and binlog is not written. From User's perspective, the table is loaded and can be seen on performance_schema. There are also no error messages printed to notify that the ddl didn't commit. This creates a problem to debug any issue in this area. Solution: --------- The partial undo of secondary load/unload ddl will be handled in bug#34592922. In this commit, we add diagnostics to reveal if the ddl failed to commit, and from what stage. Change-Id: I46c04dd5dbc07fc17beb8aa2a8d0b15ddfa171af # This is the commit message #6: WL#15280: HEATWAVE SUPPORT FOR MDS HA (TEST FIX) Since ALTER TABLE SECONDARY LOAD / UNLOAD DDL statements now write to binlog, from Heatwave's perspective, SCN is bumped up. In this commit, we are adjusting expected SCN values in certain tests which does secondary load/unload and expects SCN to match. Change-Id: I9635b3cd588d01148d763d703c72cf50a0c0bb98 # This is the commit message mysql#7: Adding MTR tests for ML in rapid group_replication suite Added MTR tests with Heatwave ML queries with in an HA setup. Change-Id: I386a3530b5bbe6aea551610b6e739ab1cf366439 # This is the commit message mysql#8: WL#15280: HEATWAVE SUPPORT FOR MDS HA (MTR TEST ADJUSTMENT) In this commit we have adjusted the existing test to work with the new MTR test infrastructure which extends the functionalities to HA landscape. With this change, a lot of mannual settings have now become redundant and thus removed in this commit. Change-Id: Ie1f4fcfdf047bfe8638feaa9f54313d509cbad7e # This is the commit message mysql#9: WL#15280: HEATWAVE SUPPORT FOR MDS HA (CLANG-TIDY FIX) Fix clang-tidy warnings found in previous change#16530, patch#20 Change-Id: I15d25df135694c2f6a3a9146feebe2b981637662 Change-Id: I3f3223a85bb52343a4619b0c2387856b09438265
- Loading branch information