Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] 8.0 from mysql:8.0 #4

Merged
merged 1,426 commits into from
Jan 17, 2023
Merged

[pull] 8.0 from mysql:8.0 #4

merged 1,426 commits into from
Jan 17, 2023

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 17, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

weigon and others added 30 commits November 3, 2022 19:24
… by server

When server closes the router-server connection while the router waits for the
client to send the next command, router will not close its side of the
router-server connection.

The server side may be closed by:

- opening a 2nd connection and KILL <first-connection-id>
- wait-timeout expiring and closing the connection

Change
------

- when idling wait for a read-event on both client and server
  connection.

- If the server sends something first or closes the
  connection, forward the error to the client and close the client and
  server connection

- If client sends first, stop waiting for input from the server and
  handle the client command.

Change-Id: Ie743d907ea45ae8fa501feca355a983fd685e1c1
…es [postfix]

After the rename of the replication terms for source and replica in the
error-msgs, the routertests started to fail as the expected errormsgs
didn't match anymore.

Change
------

In the binlog related tests, changed the expected error-msg texts to
contain the updated terms:

- allow 'master' and 'source'
- allow 'slave' and 'replica'

Change-Id: I6784fd8fccc287e5321330d5c6fa9611c6b336e8
…d on PB2 weekly 8.0

Test gr_acf_group_member_maintenance was failing due to a
error log message `failed registering replica on source`.
Though the message is expected since the test is triggering source
failures and the respective asynchronous replication channel
reconnection and failover.

To solve the above issue, now we do supress the error log message.

Change-Id: I44906c06a84c7a2ad313a0af015832a4f665b84c
… Signal (get_store_key at sql/sql_select.cc:2383)

These are two related but distinct problems manifested in the shrinkage
of key definitions for derived tables or common table expressions,
implemented in JOIN::finalize_derived_keys().

The problem in Bug#34572040 is that we have two references to one CTE,
each with a valid key definition. The function will first loop over
the first reference (cte_a) and move its used key from position 0 to
position 1. Next, it will attempt to move the key for the second
reference (cte_b) from position 4 to position 2.
However, for each iteration, the function will calculate used key
information. On the first iteration, the values are correct, but since
key value #1 has been moved into position #0, the old information is
invalid and provides wrong information. The problem is thus that for
subsequent iterations we read data that has been invalidated by earlier
key moves.

The best solution to the problem is to move the keys for all references
to the CTE in one operation. This way, we can calculate used keys
information safely, before any move operation has been performed.

The problem in Bug#34634469 is also related to having more than one
reference to a CTE, but in this case the first reference (ref_3) has
a key in position 5 which is moved to position 0, and the second
reference (ref_4) has a key in position 3 that is moved to position 1.

However, the key parts of the first key will overlap with the key parts
of the second key after the first move, thus invalidating the key
structure during the copy. The actual problem is that we move
a higher-numbered key (5) before a lower-numbered key (3), which in
this case makes it impossible to find an empty space for the moved key.

The solution to this problem is to ensure that keys are moved in
increasing key order.

The patch changes the algorithm as follows:

- When identifying a derived table/common table expression, ensure
  to move all its keys in one operation (at least those references
  from the same query block).

- First, collect information about all key uses: hash key,
  unique index keys and actual key references.
  For the key references, also populate a mapping array that enumerates
  table references with key references in order of increasing
  key number.
  Also clear used key information for references that do not use keys.

- For each table reference with a key reference in increasing key order,
  move the used key into the lowest available position. This will ensure
  that used entries are never overwritten.

- When all table references have been processed, remove unused key
  definitions.

Change-Id: I938099284e34a81886621f6a389f34abc51e78ba
The GSS plugin for SASL appears to have leaks. This causes the LDAP
SASL client plugin to fail with ASAN and valgrind.
Fixed by:
1. making sure sasl_client_done is called by the client's deinit method.
2. add vg and asan suppressions to cover the library leaks.

Change-Id: Iceb6fbb2d9483b2fcc51c2a0f004735b288bb4f0
…g on PB2 - Windows

Disable test gr_primary_mode_group_operations_net_partition_4
on Windows until the bug is fixed.

Change-Id: I32e247363eefab08372989c24670e5238c720f2d
The failing queries are 'semi-joins', which semantically are expected
to eliminate join duplicates after the first match has been found - Contrary
to normal joins where all matching rows should be returned (in any order).

Thus the differens semi-join iterators in the mysql server does some kind
of skip-read after the first matching set of row(s) has been found for a
semi-join nest of tables. This may also skip over result rows from
other tables depending on the table(s) being skip-read. I.e tables
being in the same query tree branch as the rows being skipped.

That is fine when these tables are a part of the same semi-join as
being skip-then - Then this is intended behavior.

However, we sometimes ends up with query plans where the semi-join'ed
tables are evaluated first, and the inner joined tables ends up
depending in the semi-join'ed parts. Usually (only?) seen when the
'duplicate eliminate' iterators are use din the query plan.

Note that this effectively turns the table order in the originating
SQL query upside down. E.g. the pseudo SQL query:

   select ... from t1 where <column1> in (select <column2> from t2 where <pred>)

Might get the query plan

   duplicate eliminate (select <column2> from t2) join t1 on <pred>

Thus, we have a plan where t1 depends on a semi-joined t2, without being
part of the semi-join itself. However it will have t2 as an ancestor in the
SPJ query tree if the query is pushed -> t1 becomes a subject of
the t2 duplicate elimination, effectively a skip-read operation

Due to the finite size of the batch row buffers when returning SPJ results
to the API, we might need to return t1 result rows over multiple batches,
with the t2 result rows being reused/repeated. Thus they will appear as
dupliacted to the iterators, and be skipped over, together with the t1 rows
which should not have been skipped.

Patch identifies when we have such query plans where non-semi-joined
tables are depending on semi-joined tables, _and_ both tables are
scan operation subject to such batching mechanisms. We will then reject
pushing of depending scan-tables not being an intended part of the
semi-joins itself.

Note that such query plans seems to be a rare corner case.

Patch also changes some test cases where:
 - Added two variants of existing test cases where coverage of
   duplicate eliminating iterators were not sufficient
 - Added SEMIJOIN(LOOSESCAN) hint to enisure that intended planes where produced.
 - Added two test cases for bug itself.

That ^ smoked out a query plan which returned incorrect results after modification.
With the patch pushability was reduced, and result became correct.

Change-Id: Iae890ef702cac8a50564d5fb0e493a4715c4dafd
Windows specific: Replaced use of jemalloc for memory management
within OpenSSL (on Windows) via the call to
CRYPTO_set_mem_functions in mysqld.cc. The OpenSSL memory management
functions used on Windows now use std::malloc, std::free and
std::realloc instead.

The memory management code in my_malloc.cc is refactored using function
templates to avoid duplicating the performance schema instrumentation
and debugging code.

Change-Id: I4df2d3974f215f3a8a9a7bd0fd82dd54c96fecb7
…n PB2

Test gr_member_actions_error_on_read_on_mpm_to_spm does test how
a group mode switch handles a failure during the update of the
member actions table, causing the member to leave the group.
That is achieved by enabling a debug flag that returns a error
when we close the member actions table. Though that flag, which is
set on a common code path can affect other steps of the group mode
switch, which will continue to fail and leave the group.
The test was failing because the expected error message was not
logged into the error log, which means that the group mode switch
did error out before reaching the member actions table error.

Given that the point on which group mode switch fails is not
deterministic, we remove the error log message assert from the test.

Change-Id: I42c9e3564f79c15b80ae99a1c2edee634be0f524
…d on weekly-trunk

Test gr_acf_start_failover_channels_error_on_bootstrap was failing
due to a error log message
```
  [ERROR] [MY-013211] [Repl] Plugin
  group_replication reported: 'Error while sending message. Context: primary
  election process.'
```
though the message is expected since the test is triggering group
bootstrap errors, which does include a primary election.

To solve the above issue, now we do suppress the error log message.

Change-Id: I0eb504fec68189191dc0591effd56ba26f8b3283
gr_parallel_start_uninstall forces a race condition between
  `UNINSTALL PLUGIN group_replication;`
and
  `START GROUP_REPLICATION;`
Despite the test first asynchronously executes the `UNINSTALL`,
there is the possibility that the `START` is executed first.
`START` does enabled `super_read_only`, disabling it after the
member joins the group and it is a primary.
When the `UNINSTALL` is allowed to execute once the `START`
is complete, that may happen before the `super_read_only` is
disabled.
If that happens the `UNINSTALL` will hit the error:
  ERROR 1290 (HY000): The MySQL server is running with the
  --super-read-only option so it cannot execute this statement

Since the above error is possible, we added to one of possible error
status of `UNINSTALL PLUGIN group_replication;`.

Change-Id: I9847def076ec1236a2e273befbef52d3fcdf1376
gr_parallel_stop_dml forces the execution of
  `INSERT INTO t1 VALUES(1)`
while
  `STOP GROUP_REPLICATION`
is ongoing.
The `INSERT` must fail, throwing one of the errors:
 1) `Error on observer while running replication hook
    'before_commit'`
     when the plugin is stopping.
 2) `The MySQL server is running with the
     --super-read-only option so it cannot execute this statement`
     when the plugin already stopped and enabled `super_read_only`.

The test was not considering the second error, thence we added it.

Change-Id: I1d4e539cea1a37c11c9e133f92add3615f7aabf0
               corruption if both are set.

Issue :
  Check table shall check if both the version and instant bit are set
  for a compact/dynamic row. This is a corruption scenario and check
  table shall report it.

Fix :
  check table checks the INSTANT/VERSION bits in the records and
  report if both are set.

Change-Id: I551d6d6296d8df052bcca9450e7856a24a2c5416
When a table is first created with a reference to a non-existing
variable, the derived type is text. The second time an identical
table is created, the derived type is mediumblob.
This is due to an actual variable being created on the first table
creation, and this variable is then used on the second table creation.

The main problem with this is that the variable is created with
a binary character set, whereas the first table creation is given
the correct default character set. This problem is fixed by assigning
the correct default character set to the source item when creating
the user variable.

Even after this fix, there is still a minor difference between the
two table creations: the first table gets a column with maximum length
262140 bytes, wheras the second table gets a column with maximum length
4294967295 bytes. This is because the first creation utilizes a default
character type, whereas the second utilizes the created user variable,
and those instances use different maximum lengths. Fixing this will
require a large rewrite and is not deemed worthwhile at the time being.

Change-Id: I8cd1f946dbf87047c261bfeca9d8ba7d23a9629c
Post push fix, re-recorded spj_rqg_hyeprgraph.results

Change-Id: Ifcb0cfabef31004b5aa2af32f24736810cc2ffec
Post push fix: static inline functions std_realloc and redirecting_realloc are
only used when USE_MALLOC_WRAPPER is not defined, so make these functions
conditionally compiled to avoid build breakage when compiling in maintainer
mode (-Werror).

Change-Id: If98ef4bba95289fbdd92c9cf9808ab83e4fe1d42
               DEFUALTS During UPDATE

Issue :
 This is a followup issue of 34558510 which fixes the cases for which,
 During UPDATE, we shall not materialize INSTANT ADD columns added in
 earlier implementation.

 If a table has versions, it indicates it has INSTANT ADD/DROP
 columns in new implementation. And in new implementation it is made
 sure that the maximum possible row is within the permissible limit,
 otherwise INSTANT ADD is rejected.

Fix:
 While deciding to materialize, check if table has An INSTANT ADD columns
 with added in a row versions. If it does, then we can be assured that if
 INSTANT DEFAULT are materialized, row will be within permissible limit.

Change-Id: Ia22ab7a5aa96966741ee1b95833a5eb6705448d7
…40243300361984

  Issue:
    When user keeps adding and dropping columns instantly, n_def increases.
    When n_def is increased beyond REC_MAX_N_FIELDS, it rotates back to 0
    causing the assertion.

  Fix:
    Alter handler must know if INSTANT is possible. Hence we must check the
    value of n_def and number of columns being added before proceeding with
    ALGORITHM=INSTANT. Further we must ensure that if we cannot use INSTANT;
    we must:
    1. Fall-back to INPLACE if algorithm=DEFAULT, or not specified.
    2. Error out with ER_TOO_MANY_FIELDS(Too many columns) if algorithm=INSTANT;

  Note:
    Current patch will not allow n_def to cross 1022. This is because when we
    add even 1 more column, n_def could become 1023 (which is equal to
    REC_MAX_N_FIELDS). Furthermore, this patch will error with ER_TOO_MANY_FIELDS
    only when ADDing a new column with INSTANT. We can still drop any number of
    columns instantly

  Thanks to Marcelo Altmann ([email protected]) and Percona for the contribution

Change-Id: Iff5c7d6e45c294548d515458cddfb35c00aff43e
… ONE

Post push fix : Adding a wait to fsync.

Reviewed by: Mauritz Sundell <[email protected]>

Change-Id: I26a19b9c653fd9a46849a2a3af20b9d815fcccdc
Change-Id: I78a4c09a1790d8843b6ca14ba8856c88425966a4
Change-Id: I67c36b3afcc0c1fea40efbea8c8a0b283ccbabd1
- We have much use of sprintf, which is now flagged by clang as unsafe.
  Silence this, since we have too many uses to rewrite easily.

- This version if Xcode also flags loss of precision from 64 to 32
  bits integer, silence this also. Typically when x of type size_t is
  assigned to an int.

Change-Id: I3e5f829c7fdb8ddb08c56149bc0db1a5dc277f34
bkandasa and others added 26 commits November 28, 2022 08:56
Approved-by: Bjorn Munch <[email protected]>
Unpack curl-7.86.0.tar.xz

  rm configure ltmain.sh config.guess config.sub Makefile
  rm -rf docs
  rm -rf m4
  rm -rf packages
  rm -rf plan9
  rm -rf projects
  rm -rf src
  rm -rf tests
  rm -rf winbuild

git add curl-7.86.0
Change-Id: I9e5a4d27a1d064a5870dcb8ba269bc59ed08a50e

Change-Id: I0e91991da09d6b3dee9653fce29f3c3b0ab08f78
(cherry picked from commit 9affac2f29690fc3af33cac78c0de7c0644eccba)
On Oracle Linux 7, we now support -DWITH_SSL=openssl11.
This option will automatically set WITH_CURL=bundled.

"bundled" curl is not supported for other platforms.

Disable in curl cmake files:
 - cmake_minimum_required(VERSION)
 - BUILD_SHARED_LIBS
 - CMAKE_DEBUG_POSTFIX
 - find_package(OpenSSL)
 - install(...)
 - set PICKY_COMPILER OFF by default

Change-Id: I3b9ec5048127589817e7917f564158364f0965f3
(cherry picked from commit 4c56fc4f53127cbc80bf3955b2f0747c59301c51)
Remove all old source files.

Change-Id: I82837b85aeafa1f80da66b5f34097be5648783be
(cherry picked from commit f8a70670e8b58a2054bd3c26777bae8c00953393)
We have new functionality, implemented by
    WL#15131, WL#15133: Innodb: Support Bulk Load
so do not disable the FILE protocol in Curl.

Change-Id: Ib05f4656c2d13c620756518638ef73fa373cf63f
(cherry picked from commit e85db298f4ba0a2de53baa978f452d1107c48f7a)
Unpack source tarball, git add everything.

Change-Id: Ib6eb64f8e132ca59539208f7bf69245268804ee5
Remove things we do not need/want.

git rm -rf amiga/ contrib/ doc/ examples/ nintendods/ Makefile zconf.h

Change-Id: Ibd76884411c6596f2fcfcb6c3fe2f1f4aabadb73
Bump MIN_ZLIB_VERSION_REQUIRED to "1.2.13"
and adjust paths to bundled zlib sources.

In extra/zlib/zlib-1.2.13/CMakeLists.txt:
 - apply cumulative patches from previous zlib upgrade
 - apply fix to MacOS build (bug #34776172)

Change-Id: I1a0aeff115a96a0993f2f396c643eda1c1b4900b
Remove all old source files.

Change-Id: I456635823feb21faa42b683f0bfb62d353cb80d4
When a socket is shutdown() on both sides, but not closed AND the socket
is still monitoed via epoll_wait(), epoll_wait will return
EPOOLHUP|EPOLLERR.

It will be logged as:

  after_event_fired(54, 00000000000000000000000000011000) not in
  11000000000000000000000000000000

As EPOLLHUP and EPOLLERR are always watched for even if they aren't
explicitely requested, not handling them may lead to an infinite loop
and high CPU usage until the socket gets closed.

Additionally, events may be reported for fds which are already closed
which may happen if:

1. io_context::poll_one() led to epoll_wait() fetching multiple events:
   [(1, IN|HUP), (2, IN)]
2. when the first event is processed, event handler (for fd=1),
   closes fd=2 (which leads to epoll_ctl(DEL, fd=2) and close(2)
3. io_context::poll_one() processes the next event: (2, IN) ... but no
   handler for fd=2 exists.

This is more problematic if a new connection which fd=2 was opened in
the meantime:

1. io_context::poll_one() led to epoll_wait() fetching multiple events:
   [(1, IN|HUP), (2, HUP)]
2. when the first event is processed, event handler (for fd=1),
   closes fd=2 (which leads to epoll_ctl(DEL, fd=2) and close(2)
3. new connection with fd=2 gets accepted.
4. io_context::poll_one() processes the next event: (2, HUP) ... sends
   event to fd=2 which gets closed event though the HUP event was for
   the old fd=2, not the current one.

Change
======

- expose EPOLLHUP and EPOLLERR as their own, seperate events.
- if none of EPOLLHUP|EPOLLERR|EPOLLIN|EPOLLOUT is requested,
  don't pass the fd to epoll_wait().
- remove polled-events when the fd is removed from the io-context

Change-Id: I145cacd457fa9876112789eb4bfd06fce1722c45
Change
======

Repeat the changes done for linux_epoll in [1/3]

- expose POLLHUP and POLLERR as their own, seperate events.
- if no interest for any of POLLHUP, POLLERR, POLLIN or POLLOUT
  is registered, don't pass that fd to poll()
- treat POLLHUP as POLLIN if only POLLIN is waited for to handle the
  connection-close case nicely on windows.
- remove queued events if a fd is removed from the registered set.
- added unittests for the poll io-service

Change-Id: I1311513492fe755d5f23432b34721e0ab1fc88a7
Change
======

linux timestamping reports when a packet stepped through the layers of
the linux network stack on the send and receive side.

- kernel -> driver
- driver -> cable

linux timestamping are reported as EPOLLERR without EPOLLHUP and serve
as test-bed for the EPOLLERR handling.

Change-Id: I083e304d23c72880b974863d29c29aa9d25b8694
Reverting the following WLs and bug fixes:

 - WL#14772 InnoDB: Parallel Index Build
 - WL#15131 Innodb: Support Bulk Load with Sorted data
 - WL#15133 Innodb: Support Bulk Load from OCI Object Store
 - Bug #34840684 Assertion failure: mtr0log.cc:175:!page || !page_zip || !fil_page_index_page_che
 - Bug #34819343 Assertion failure: btr0btr.cc:731:ib::fatal triggered thread 140005531973376
 - Bug #34646510 innodb.zlob_ddl_big failing on pb2 daily-trunk

Reverted Commit-Ids:

a8940134dd8d33e7fc25f641d627b640d56769b6
ae9fd03687486b5d01a7dbe766d73993d7c78efa
c4388545dc98e472b0f3d96db0e0d19d8231dc56
fd950026c1a4d11294b3448d8bfcd94631618611
ae9fd03687486b5d01a7dbe766d73993d7c78efa
226765401a5daa4a2443e1507343ed264f62f60f

Change-Id: I392bda99eeb825174d156fcd169caef7c4b712b0
… statements

                for connect_timeout seconds, causing pileups

Description:
------------
This is a regression caused due to the fix made for the Bug 34094706. When a
connection somehow stalls/blocks during the authentication phase, where a mutex
is held, the other connections that are executing queries on I_S and P_S are
blocked until the first connection release the mutex.

Fix:
----
Instead of using the mutex and checking the thd->active_vio, we now check the
value of net.vio type in the is_secure_transport() check.

Change-Id: I02f50f7e90c6e683a7bbe0b5f99b932e819f1f08
…read to stop

Problem
-------

In case a binary log dump thread waits for new events with a heartbeat
configured and a new event arrives, it is possible that a binary log dump
thread will send an EOF packet to connected client
(replica/mysqlbinlog/custom client...) before sending all of the events.

Analysis / Root-cause analysis
------------------------------

It happens in case binary log dump thread exits with a timeout on conditional
variable just before position gets updated. Function 'wait_with_heartbeat'
exits with a code 1, which is treated later on as the end of the execution.

Solution
--------

Ignore the code returned from the 'wait' function, since a timeout is
not important information for the binary dump log thread. In case a
timeout occurs, binary log dump thread should continue execution or
abort in case thread was stopped. Return 0 from the wait_with_heartbeat
or 1 in case of send/flush error.

Signed-off-by: Karolina Szczepankiewicz <[email protected]>
Change-Id: I027985aafc1234194f0798ba52b65cce36936f24
gcc12 reports:

harness/tests/linux_timestamping.cc:741:15: error: narrowing conversion
of ‘attr_type’ from ‘size_t’ {aka ‘long unsigned int’} to ‘short
unsigned int’ [-Werror=narrowing]
  741 |       return {attr_type, {payload, payload_len}};

Change-Id: I28fb1a1ca32e6ffd1febe44c704a1ae438b414a2
PROBLEM:
- In current version pattern for naming hidden dropped column has
  changed.
- When cfg file is taken from older version hidden dropped column
  name follows old pattern.
- When INSTANT operations are done in current version exactly in same
  order as done before creating cfg file, then the server crashes.

FIX:
- When searching dropped column with older name version returns null,
  IMPORT fails with error SCHEMA_MISMATCH.

Change-Id: Ifd93adafb78f0aa7b5ae1980b64a3230f94deae9
@pull pull bot added the ⤵️ pull label Jan 17, 2023
@pull pull bot merged commit 1bfe02b into Mu-L:8.0 Jan 17, 2023
pull bot pushed a commit that referenced this pull request Apr 27, 2023
Potential read of uninitialized dataNodes[0] when no alive nodes are
available. Fix by checking for zero data nodes and fail test.

trunk/storage/ndb/test/src/UtilTransactions.cpp:1710:9: warning: 3rd
function call argument is an uninitialized value [clang-analyzer-
core.CallAndMessage]

Change-Id: I7c1e362eb0d62bdb560967144ca39966aed8a3c1
pull bot pushed a commit that referenced this pull request Apr 27, 2023
  # This is the 1st commit message:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA

  Problem Statement
  -----------------
  Currently customers cannot enable heatwave analytics service to their
  HA DBSystem or enable HA if they are using Heatwave enabled DBSystem.
  In this change, we attempt to remove this limitation and provide
  failover support of heatwave in an HA enabled DBSystem.

  High Level Overview
  -------------------
  To support heatwave with HA, we extended the existing feature of auto-
  reloading of tables to heatwave on MySQL server restart (WL-14396). To
  provide seamless failover functionality to tables loaded to heatwave,
  each node in the HA cluster (group replication) must have the latest
  view of tables which are currently loaded to heatwave cluster attached
  to the primary, i.e., the secondary_load flag should be in-sync always.

  To achieve this, we made following changes -
    1. replicate secondary load/unload DDL statements to all the active
       secondary nodes by writing the DDL into the binlog, and
    2. Control how secondary load/unload is executed when heatwave cluster
       is not attached to node executing the command

  Implementation Details
  ----------------------
  Current implementation depends on two key assumptions -
   1. All MDS DBSystems will have RAPID plugin installed.
   2. No non-MDS system will have the RAPID plugin installed.

  Based on these assumptions, we made certain changes w.r.t. how server
  handles execution of secondary load/unload statements.
   1. If secondary load/unload command is executed from a mysql client
      session on a system without RAPID plugin installed (i.e., non-MDS),
      instead of an error, a warning message will be shown to the user,
      and the DDL is allowed to commit.
   2. If secondary load/unload command is executed from a replication
      connection on an MDS system without heatwave cluster attached,
      instead of throwing an error, the DDL is allowed to commit.
   3. If no error is thrown from secondary engine, then the DDL will
      update the secondary_load metadata and write a binlog entry.

  Writing to binlog implies that all the consumer of binlog now need to
  handle this DDL gracefully. This has an adverse effect on Point-in-time
  Recovery. If the PITR backup is taken from a DBSystem with heatwave, it
  may contain traces of secondary load/unload statements in its binlog.
  If such a backup is used to restore a new DBSystem, it will cause failure
  while trying to execute statements from its binlog because
   a) DBSystem will not heatwave cluster attached at this time, and
   b) Statements from binlog are executed from standard mysql client
      connection, thus making them indistinguishable from user executed
      command.
  Customers will be prevented (by control plane) from using PITR functionality
  on a heatwave enabled DBSystem until there is a solution for this.

  Testing
  -------
  This commit changes the behavior of secondary load/unload statements, so it
   - adjusts existing tests' expectations, and
   - adds a new test validating new DDL behavior under different scenarios

  Change-Id: Ief7e9b3d4878748b832c366da02892917dc47d83

  # This is the commit message #2:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA (PITR SUPPORT)

  Problem
  -------
  A PITR backup taken from a heatwave enabled system could have traces
  of secondary load or unload statements in binlog. When such a backup
  is used to restore another system, it can cause failure because of
  following two reasons:

  1. Currently, even if the target system is heatwave enabled, heatwave
  cluster is attached only after PITR restore phase completes.
  2. When entries from binlogs are applied, a standard mysql client
  connection is used. This makes it indistinguishable from other user
  session.

  Since secondary load (or unload) statements are meant to throw error
  when they are executed by user in the absence of a healthy heatwave
  cluster, PITR restore workflow will fail if binlogs from the backup
  have any secondary load (or unload) statements in them.

  Solution
  --------
  To avoid PITR failure, we are introducing a new system variable
  rapid_enable_delayed_secondary_ops. It controls how load or unload
  commands are to be processed by rapid plugin.

    - When turned ON, the plugin silently skips the secondary engine
      operation (load/unload) and returns success to the caller. This
      allows secondary load (or unload) statements to be executed by the
      server in the absence of any heatwave cluster.
    - When turned OFF, it follows the existing behavior.
    - The default value is OFF.
    - The value can only be changed when rapid_bootstrap is IDLE or OFF.
    - This variable cannot be persisted.

  In PITR workflow, Control Plane would set the variable at the start of
  PITR restore and then reset it at the end of workflow. This allows the
  workflow to complete without failure even when heatwave cluster is not
  attached. Since metadata is always updated when secondary load/unload
  DDLs are executed, when heatwave cluster is attached at a later point
  in time, the respective tables get reloaded to heatwave automatically.

  Change-Id: I42e984910da23a0e416edb09d3949989159ef707

  # This is the commit message #3:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA (TEST CHANGES)

  This commit adds new functional tests for the MDS HA + HW integration.

  Change-Id: Ic818331a4ca04b16998155efd77ac95da08deaa1

  # This is the commit message #4:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA
  BUG#34776485: RESTRICT DEFAULT VALUE FOR rapid_enable_delayed_secondary_ops

  This commit does two things:
  1. Add a basic test for newly introduced system variable
  rapid_enable_delayed_secondary_ops, which controls the behavior of
  alter table secondary load/unload ddl statements when rapid cluster
  is not available.

  2. It also restricts the DEFAULT value setting for the system variable
  So, following is not allowed:
  SET GLOBAL rapid_enable_delayed_secondary_ops = default
  This variable is to be used in restricted scenarios and control plane
  only sets it to ON/OFF before and after PITR apply. Allowing set to
  default has no practical use.

  Change-Id: I85c84dfaa0f868dbfc7b1a88792a89ffd2e81da2

  # This is the commit message #5:

  Bug#34726490: ADD DIAGNOSTICS FOR SECONDARY LOAD / UNLOAD DDL

  Problem:
  --------
  If secondary load or unload DDL gets rolled back due to some error after
  it had loaded / unloaded the table in heatwave cluster, there is no undo
  of the secondary engine action. Only secondary_load flag update is
  reverted and binlog is not written. From User's perspective, the table
  is loaded and can be seen on performance_schema. There are also no
  error messages printed to notify that the ddl didn't commit. This
  creates a problem to debug any issue in this area.

  Solution:
  ---------
  The partial undo of secondary load/unload ddl will be handled in
  bug#34592922. In this commit, we add diagnostics to reveal if the ddl
  failed to commit, and from what stage.

  Change-Id: I46c04dd5dbc07fc17beb8aa2a8d0b15ddfa171af

  # This is the commit message #6:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA (TEST FIX)

  Since ALTER TABLE SECONDARY LOAD / UNLOAD DDL statements now write
  to binlog, from Heatwave's perspective, SCN is bumped up.

  In this commit, we are adjusting expected SCN values in certain
  tests which does secondary load/unload and expects SCN to match.

  Change-Id: I9635b3cd588d01148d763d703c72cf50a0c0bb98

  # This is the commit message mysql#7:

  Adding MTR tests for ML in rapid group_replication suite

  Added MTR tests with Heatwave ML queries with in
  an HA setup.

  Change-Id: I386a3530b5bbe6aea551610b6e739ab1cf366439

  # This is the commit message mysql#8:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA (MTR TEST ADJUSTMENT)

  In this commit we have adjusted the existing test to work with the
  new MTR test infrastructure which extends the functionalities to
  HA landscape. With this change, a lot of mannual settings have now
  become redundant and thus removed in this commit.

  Change-Id: Ie1f4fcfdf047bfe8638feaa9f54313d509cbad7e

  # This is the commit message mysql#9:

  WL#15280: HEATWAVE SUPPORT FOR MDS HA (CLANG-TIDY FIX)

  Fix clang-tidy warnings found in previous change#16530, patch#20

  Change-Id: I15d25df135694c2f6a3a9146feebe2b981637662

Change-Id: I3f3223a85bb52343a4619b0c2387856b09438265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.