Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support syntax SET TRANSACTION READ ONLY AS OF ... and SELECT ... AS OF for Stale read #22765

Closed
djshow832 opened this issue Feb 10, 2021 · 20 comments
Labels
component/parser sig/transaction SIG:Transaction type/enhancement The issue or PR belongs to an enhancement.

Comments

@djshow832
Copy link
Contributor

djshow832 commented Feb 10, 2021

Background

This is a subtask of #21094.

Starting staleness transactions with extending syntax START TRANSACTION makes integrating into Java applications much more difficult, but extending syntax SET TRANSACTION will be more acceptable. Besides, the original syntax is too complicated, see #22505 and #22506.

Now that TiDB has supported SET TRANSACTION READ ONLY (syntax only), we need to extend it to SET TRANSACTION READ ONLY WITH [EXACT] {TIMESTAMP | STALENESS} Expression.

Here are some examples:

SELECT statement:

SELECT … FROM … AS OF [EXACT] TIMESTAMP  '2021-01-19 00:00:00’
SELECT … FROM … AS OF [EXACT] TIMESTAMP  '-2h'

Read-Only transaction:

SET TRANSACTION READ ONLY  AS OF [EXACT] TIMESTAMP '2021-01-19 00:00:00'`
BEGIN
COMMIT

or

SET TRANSACTION READ ONLY  AS OF [EXACT] TIMESTAMP '-h'`
BEGIN
COMMIT

SET [GLOBAL|SESSION] TRANSACTION variables is equivalent to SET [GLOBAL|SESSION] variables in MySQL. However, setting global or session variables for staleness transactions may make junior users confused because there would be too many SQL interfaces to manipulate them. Now we only need to support transaction scope.

By default, the SET TRANSACTION statement only affects the next transaction and they must be set BEFORE a transaction. E.g.

SET TRANSACTION READ ONLY  AS OF [EXACT] TIMESTAMP '-10s'`
START TRANSACTION;
# Queries

Refer to the MySQL SET TRANSACTION document https://dev.mysql.com/doc/refman/8.0/en/set-transaction.html for more details.

Furthermore, we are going to support system variables as synonyms for SET TRANSACTION READ ONLY WITH [EXACT] {TIMESTAMP | STALENESS} Expression. This is conventional in MySQL. But we don't need to expose these variables to users for now.
The system variables are:

  • transaction_read_only_mode for WITH [EXACT]. Available choices are: "strong", "exact_stale", "noexact_stale".
  • transaction_read_only_timestamp for {TIMESTAMP | STALENESS} Expression.

On the other hand, keep the old syntax START TRANSACTION READ ONLY WITH ... for now, in case of requirement changes. But do not expose it to users in any documents.

Implementation

Support the new syntax in the parser

Some VariableAssignment will be appended in varAssigns:

  • WITH [EXACT] corresponds to a stale read flag. Available choices are: strong read, exact stale read, bounded stale read.
  • TIMESTAMP Expression corresponds to a timestamp or duration.

For each VariableAssignment, IsGlobal is false and IsSystem is true.

Set system variables for transactions

Define 2 system variables and add them into initSynonymsSysVariables.

Note that the other 2 synonyms transaction_isolation and transaction_read_only don't actually work.

For now, system variables only have 3 scopes: global, session, and none. We need to define a next-transaction scope.

Make system variables work

// TODO

@djshow832 djshow832 added type/enhancement The issue or PR belongs to an enhancement. component/parser sig/transaction SIG:Transaction labels Feb 10, 2021
@djshow832
Copy link
Contributor Author

@djshow832
Copy link
Contributor Author

djshow832 commented Feb 10, 2021

SET TRANSACTION READ ONLY WITH EXACT STALENESS '00:00:05';
SET TRANSACTION READ ONLY WITH EXACT TIMESTAMP '2021-01-19 00:00:00';
SET TRANSACTION READ ONLY WITH BOUNDED STALENESS '00:00:05';
SET TRANSACTION READ ONLY WITH BOUNDED TIMESTAMP '2021-01-19 00:00:00';

Would this be better?

@nolouch
Copy link
Member

nolouch commented Feb 10, 2021

@djshow832 Would like this more. I think it's more clear.
cc @morgo @scsldb

SET TRANSACTION READ ONLY WITH EXACT STALENESS '00:00:05';
SET TRANSACTION READ ONLY WITH EXACT TIMESTAMP '2021-01-19 00:00:00';
SET TRANSACTION READ ONLY WITH BOUNDED STALENESS '00:00:05';
SET TRANSACTION READ ONLY WITH BOUNDED TIMESTAMP '2021-01-19 00:00:00';

Would this be better?

@morgo
Copy link
Contributor

morgo commented Feb 10, 2021

@djshow832 yes, I really like this syntax:

SET TRANSACTION AS OF SYSTEM TIME follower_read_timestamp();

The function is nice because it means the syntax can be unified between staleness (offset) and timestamp (ts). My understanding is that it also implies exact staleness by referring to a point far enough in the past?

@djshow832
Copy link
Contributor Author

djshow832 commented Feb 19, 2021

@morgo I have some concerns:

  • AS OF SYSTEM TIME always implies exact semantics, no matter what timestamp / interval / function it refers to. But in our bounded semantics, the timestamp is not necessary the exact start ts which the transaction uses.
  • follower_read_timestamp() always returns now - 4.8s, where 4.8s is configured in the cluster. In their LAG_BY_CLUSTER_SETTING policy, they can always skip calculating and take now - 4.8s as the start ts of a staleness transaction. If a read-write transaction commits after the start ts, it aborts. But in TiDB, we take resolved ts as the start ts of staleness transactions, and no read-write transactions will be affected.
  • The syntax of CRDB is similar to PostgreSQL. I'm not sure if AS OF SYSTEM TIME is a PG-style syntax. There's no similar syntax in MySQL.
  • The statement doesn't imply READ ONLY semantics but the statement must be read-only. IMO, READ ONLY is better to be declared in the statement.

@morgo
Copy link
Contributor

morgo commented Feb 19, 2021

Thanks for the detailed reply :-) On this point:

The syntax of CRDB is similar to PostgreSQL. I'm not sure if AS OF SYSTEM TIME is a PG-style syntax. There's no similar syntax in MySQL.

It comes from System-versioned temporal tables (which MySQL does not support, but MariaDB does). There is an article here for SQL Server which says it is from ANSI SQL 2011.

@nolouch
Copy link
Member

nolouch commented Mar 4, 2021

relative #18672

@morgo
Copy link
Contributor

morgo commented Mar 4, 2021

See also: #22658

@nolouch
Copy link
Member

nolouch commented Mar 30, 2021

We need to consider the SELECT statement. In some scenarios, using select statements can reduce unnecessary interaction and improve performance.

@nolouch
Copy link
Member

nolouch commented Mar 31, 2021

Hi, @djshow832 @morgo
how about :
SELECT statement:

SELECT … FROM … AS OF [EXACT] TIMESTAMP  '2021-01-19 00:00:00’
SELECT … FROM … AS OF [EXACT] TIMESTAMP  '-2h'

Read-Only transaction:

SET TRANSACTION READ ONLY  AS OF [EXACT] TIMESTAMP '2021-01-19 00:00:00'`
BEGIN
COMMIT

or

SET TRANSACTION READ ONLY  AS OF [EXACT] TIMESTAMP '-h'`
BEGIN
COMMIT

NOTE:default use bounded semantics, like SELECT … FROM … AS OF TIMESTAMP '-2h' will read the newly data which can directly read without wait lock. the EXACT mean read the 2h ago data.

@morgo
Copy link
Contributor

morgo commented Mar 31, 2021

@nolouch I like it a lot. It makes sense to me.

Edit: One question. Should it be valid as SET TRANSACTION and START TRANSACTION? MySQL accepts both variants here:

@morgo
Copy link
Contributor

morgo commented Apr 1, 2021

There is an added MySQL-ism to be aware of (it is minor). You can start a read only transaction with a session var as well:

mysql [localhost:5731] {msandbox} ((none)) > SET tx_read_only=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql [localhost:5731] {msandbox} ((none)) > show warnings;
+---------+------+----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                              |
+---------+------+----------------------------------------------------------------------------------------------------------------------+
| Warning | 1287 | '@@tx_read_only' is deprecated and will be removed in a future release. Please use '@@transaction_read_only' instead |
+---------+------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

If you start a read-only transaction it won't explicitly update this variable though. JDBC drivers are noisy at checking the value of this.

@nolouch
Copy link
Member

nolouch commented Apr 12, 2021

Hi @morgo Can we only support SET,SELECT firstly? In JDBC or some ORMs, users can use the set statement to minimize changes in the application code.

@nolouch nolouch changed the title Support syntax SET TRANSACTION READ ONLY WITH ... Support syntax SET TRANSACTION READ ONLY AS OF ... and SELECT ... AS OF for Stale read Apr 15, 2021
@nolouch
Copy link
Member

nolouch commented Apr 16, 2021

I do some adjustments to make Bound and Exact for the stale read more clear, and it also can match the Flashback. such as:
SELECT statement:

// exact
SELECT … FROM … AS OF TIMESTAMP '2021-01-19 00:00:00’

// bound, will choose a suitable timestamp that can be as up-to-date as possible in the specified window.
SELECT … FROM … AS OF TIMESTAMP BETWEEN (NOW() - INTERVAL 3 SECOND, NOW())

I want to use AS OF TIMESTAMP xxx to unify all the historical reading scenes, but I have two questions:

  1. Which one to pick? AS OF XXXX or FOR SYSTEM TIME AS OF XXX

  2. Where is the position of this clause?

    • clause in TableRefs, mariadb, sql-server, oracle all do this way, But all of them work for System-Versioned (temporal) Tables, we do not have it. like:
    from_item: {
           table_name [ [ AS ] alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ]  |
           join |
           ( query_expr ) [ [ AS ] alias ] |
           field_path |
           { UNNEST( array_expression ) | UNNEST( array_path ) | array_path }
               [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] |
           with_query_name [ [ AS ] alias ]
    }
    
    • clause after TableRefs, at the very end of the FROM sub-clause. Cockroach do that:
      image

cc @morgo @kennytm @zz-jason @scsldb

@xhebox
Copy link
Contributor

xhebox commented Apr 16, 2021

But all of them works for Versioned Tables, we not have it.

I am looking at how does CockroachDB deal with DDL/stale read compatibility. And from what I have seen, I think they may have versioned tables too...

@nolouch
Copy link
Member

nolouch commented Apr 16, 2021

@xhebox Does the versioned tables in CockroachDB like this? I think we are the same as CorckroachDB naturally has versions, and GC. but it is not the same as the System-Versioned (temporal) Tables, But it is true that we can support some behaviors similar to the System-Versioned Temporal Tables.

@xhebox
Copy link
Contributor

xhebox commented Apr 16, 2021

No. And you are correct. However, the metadata loading process is not versioned on TiDB :(

@kennytm
Copy link
Contributor

kennytm commented Apr 16, 2021

Note that the PostgreSQL one doesn't count, it is just a proposal from some random person.


The select … for system_time as of syntax is being compatible with the SQL standard "temporal table" feature. This is both an advantage and disadvantage — The pros being that the syntax is standardized so we have a clear example to follow. The cons being that "time travel query" is not exactly temporal table and thus may violate the expectation of "temporal table".

In particular, in MariaDB, the name system_time just refers to the "period" formed by the two columns start_timestamp and end_timestamp recorded directly in the table. Stale read / time travel query works nothing like a temporal table.

Therefore, as long as system_time is a non-reserved word, I suggest we avoid using the select … for system_time as of syntax directly.


That said, there is a potential syntax ambiguity requiring look-ahead if we decide to use just as of which may complicate the grammar a bit.

SELECT * FROM t AS u;
SELECT * FROM t AS OF ....
--                 ^ cannot distinguish between table-alias or as-of query until here

BTW, CrDB's as of system time clause is equivalent to our set tidb_snapshot = ... in terms of semantics, which is reflected on the syntax that the following is invalid:

SELECT * FROM t AS OF SYSTEM TIME (a), u AS OF SYSTEM TIME (b);

so let's not follow CrDB's syntax exactly either.

@nolouch
Copy link
Member

nolouch commented Apr 21, 2021

After discussed with @kennytm for this comment, the conclusion:
SELECT statement:

// exact
SELECT … FROM … AS OF TIMESTAMP '2021-01-19 00:00:00’

// bound will choose a suitable timestamp that can be as up-to-date as possible in the specified window.
SELECT … FROM … AS OF TIMESTAMP TIDB_BOUND_STALENESS(NOW() - INTERVAL 3 SECOND, NOW())

This means we only introduced AS OF TIMESTAMP syntax.

  1. Which one to pick? AS OF XXXX or FOR SYSTEM TIME AS OF XXX

    For this question, we avoid using the select … for system_time as of syntax directly according to @kennytm 's suggestion.

  2. Where is the position of AS OF clause?

    For forward compatibility, and the consider about the usages of the flashback query such as:

    INSERT INTO employees
    (SELECT * FROM employees
     AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '60' MINUTE)
    MINUS SELECT * FROM employees;
    

    We place the AS OF clause in TableFactor, that's mean we need to specify the timestamp for each table when use stale read or history read.
    and for bound stale read, such as:

    select * from 
             a as of timestamp tidb_bound_staleness(now() - interval 3 second, now()), 
             b as of timestamp tidb_bound_staleness(now() - interval 3 second, now());
    

    In order to eliminate ambiguity, we have to make tidb_bound_staleness() as deterministic function, which is the same as now().

@nolouch
Copy link
Member

nolouch commented Jun 15, 2021

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/parser sig/transaction SIG:Transaction type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

5 participants