Transaction timeout for longer queries #20

aakashnand · 2021-11-21T15:12:41Z

Background

I have been experiencing this issue for a long time. This issue happens when we try to execute the query which takes more than 5min. To reproduce this issue use dbt seed which can insert some rows which will take more than 5mins.

Reason

The reason for this issue is that, In Trino, the default value of transaction.idle-timeout is 5minutes and dbt executes commands in the following flow which causes this error

START TRANSACTION
Some queries which will take more than 5min
COMMIT <- This will throw an error

Error Log

trino.exceptions.DatabaseError: failed to commit transaction 73dc7443-ec53-4911-95be-a372792743b4: TrinoUserError(type=USER_ERROR, name=UNKNOWN_TRANSACTION, message="Unknown transaction ID: 73dc7443-ec53-4911-95be-a372792743b4. Possibly expired? Commands ignored until end of transaction block", query_id=20211121_145311_00013_7rttf)

Is there any specific reason why transactions are explicitly started in dbt-trino in spite of using auto-commit in python connection?

dbt-trino/dbt/adapters/trino/connections.py

Line 195 in 8088447

isolation_level=IsolationLevel.AUTOCOMMIT,

Other reference:

dbt-labs/dbt-presto#75

The text was updated successfully, but these errors were encountered:

findinpath · 2021-11-22T07:37:49Z

dbt is responsible for performing the changes in a transactional fashion.
If you think about it, this is meaningful in order to avoid having corrupt data after running dbt .

@aakashnand the PR trinodb/trino#9846 may be linked to your issue. It will be available only from Trino version 365.

aakashnand · 2021-11-22T09:02:50Z

this is meaningful in order to avoid having corrupt data after running dbt

@findinpath even when we are connecting using AUTOCOMMIT?

dbt-trino/dbt/adapters/trino/connections.py

Line 195 in 8088447

isolation_level=IsolationLevel.AUTOCOMMIT,

My understanding here is that when we use AUTOCOMMIT, every query is committed as soon as it is finished. In this sense, even if we execute 1000s of queries the end result will not lead to data corruption.

Also, what is the correct solution to solve this issue?

findinpath · 2021-11-22T10:06:39Z

dbt can execute more than one statement corresponding to a materialization.
Maybe I am wrong, please double check this on dbt slack, but this is the reason why dbt works with transactions.

bachng2017 · 2021-11-22T10:59:12Z

it might be a trino question than a dbt question, @findinpath, do you know if this kind of transaction works in trino ?

(for example seeding sequence looks like this):
transaction begin
insert 1
insert 2
insert 3
commit or rollback

as far as we've tested, if insert 3 fails, the result of insert 1 and 2 still in the seed table.

Maybe the transaction start/commit in dbt-presto/trino is just a place-holder or remains of a copy from other drivers. (btw, with snowflake, the rollback in previous sample seems to clear up the result of all inserts, as expected)

findinpath · 2021-11-22T12:34:36Z

@bachng2017

as far as we've tested, if insert 3 fails, the result of insert 1 and 2 still in the seed table.

Indeed, Trino doesn't support fully ACID semantics.
Trino makes sure that ONE INSERT statement doesn't create corrupt data by inserting data in one or more staging table(s) and then doing on the target database INSERT INTO target_table SELECT * FROM staging_table(s).

See https://www.trinoforum.org/t/insert-performance-on-trino-jdbc-connectors/99/3 for a more detailed explanation about the way that INSERT works on Trino.

aakashnand · 2021-11-22T12:42:53Z

@findinpath Do you have any solution in mind for this or should we discuss this on slack channel #dbt-presto-trino?
The temporary solution we have is to increase transaction.idle-timeout=8h in config.properties

findinpath · 2021-11-22T13:33:29Z

@aakashnand I have no better solution than what you just posted.

A slightly different approach is to set the sesssion properties on dbt and affect therefore only the dbt project and not all Trino clients with the transaction idle timeout setting.

my-trino-db:
  target: dev
  outputs:
    dev:
      type: trino
      user: commander
      host: 127.0.0.1
      port: 8080
      database: analytics
      schema: public
      threads: 8
      http_scheme: http
      session_properties:
        ....

hovaesco · 2021-11-22T13:57:35Z

I believe transaction.idle-timeout cannot be set as a session property.

aakashnand · 2021-11-22T14:13:12Z

As suggested by @hovaesco this property cannot be set using session property so we need to set it in the config.properties 🙁

bachng2017 · 2021-11-23T01:49:01Z

we might have understood what happens but want to find more evidences.

btw @findinpath , could you run this example ?
https://github.com/trinodb/trino-python-client#transactions

somehow it did not work in our env with TWO inserts. ONE insert is fine.
just want to make sure our env. is setup correctly.

bachng2017 · 2021-11-23T04:20:01Z

I think the reason the transaction does not work is on our side. Anyway, we think the reason the time-out occurred at the 1st place might be this:

dbt makes the connection with IsolationLevel.AUTOCOMMIT as default. So when the sql is executed, it creates a request (#a). And when the start_transaction is execute, #b create the transaction in another request instance.
The query and the transaction happens independently. This answers the question why we had the error at the COMMIT but the query has been succeed. And also a simple test with start_transaction -> wait long enough -> commit yields the same error.

We need more test to see how to improve this but there might be 2 options now:

use a different isolation level when dbt-trino/presto initializes the connection
change the way trino client implements #a to care about current transaction instance, not only isolation level

findinpath · 2021-11-23T08:58:56Z

@hashhar / @ebyhr can you please take a look at the comment #20 (comment) and help out with feedback?

hashhar · 2021-11-23T16:55:33Z

@bachng2017 Good catch. that indeed looks like a bug in how transactions are handled in the Trino python client.

Can you file the issue against the Python client with a simple reproduction code so that we can verify that the fix that gets implemented actually works?

If I understand correctly any query ran within a transaction that takes > transaction timeout will exhibit this issue?

bachng2017 · 2021-11-23T23:42:02Z

@hashhar , thanks for confirmation. Will file a issue for this.

And yes, that is what we are facing

If I understand correctly any query ran within a transaction that takes > transaction timeout will exhibit this issue?

bachng2017 · 2021-11-24T23:20:47Z

@hashhar created an issue for this at trino-python-client, pls check
trinodb/trino-python-client#129

hovaesco · 2021-12-30T12:41:44Z

PR to address the issue: #30

hovaesco · 2021-12-31T10:13:26Z

Merged #30

bachng2017 · 2022-08-18T10:42:41Z

hello @hovaesco
I would like to confirm about the #30 merge (sorry that I should contact earlier but did not have time)

the fix seems to eliminate (pass) all start_transaction/commit/rollback in the old codes with this reason

dbt-trino adapter uses connection in auto-commit mode, hence there is no need to separately start a new transaction in each query since it makes no effect. In auto-commit mode, the client will create a new transaction for each query.

with the old codes, the typical sequence happens like this:

start transaction -> create tmp table -> create real table ->  commit

in new codes, it just

create tmp table -> create real table

so, autocommit for each query might not be sufficient . What do you think ?

the affect is not too big but the concept is very different from the old codes.

hovaesco · 2022-08-18T13:58:56Z

Could you please elaborate more on autocommit for each query might not be sufficient ?
Each query is executed with autocommit so each operation is committed immediately after performing it.

hovaesco mentioned this issue Dec 30, 2021

Drop transaction queries #30

Merged

hovaesco closed this as completed Dec 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transaction timeout for longer queries #20

Transaction timeout for longer queries #20

aakashnand commented Nov 21, 2021 •

edited

Loading

findinpath commented Nov 22, 2021

aakashnand commented Nov 22, 2021 •

edited

Loading

findinpath commented Nov 22, 2021

bachng2017 commented Nov 22, 2021 •

edited

Loading

findinpath commented Nov 22, 2021

aakashnand commented Nov 22, 2021

findinpath commented Nov 22, 2021

hovaesco commented Nov 22, 2021

aakashnand commented Nov 22, 2021

bachng2017 commented Nov 23, 2021 •

edited

Loading

bachng2017 commented Nov 23, 2021 •

edited

Loading

findinpath commented Nov 23, 2021

hashhar commented Nov 23, 2021 •

edited

Loading

bachng2017 commented Nov 23, 2021

bachng2017 commented Nov 24, 2021

hovaesco commented Dec 30, 2021

hovaesco commented Dec 31, 2021

bachng2017 commented Aug 18, 2022

hovaesco commented Aug 18, 2022

Transaction timeout for longer queries #20

Transaction timeout for longer queries #20

Comments

aakashnand commented Nov 21, 2021 • edited Loading

Background

Reason

Error Log

findinpath commented Nov 22, 2021

aakashnand commented Nov 22, 2021 • edited Loading

findinpath commented Nov 22, 2021

bachng2017 commented Nov 22, 2021 • edited Loading

findinpath commented Nov 22, 2021

aakashnand commented Nov 22, 2021

findinpath commented Nov 22, 2021

hovaesco commented Nov 22, 2021

aakashnand commented Nov 22, 2021

bachng2017 commented Nov 23, 2021 • edited Loading

bachng2017 commented Nov 23, 2021 • edited Loading

findinpath commented Nov 23, 2021

hashhar commented Nov 23, 2021 • edited Loading

bachng2017 commented Nov 23, 2021

bachng2017 commented Nov 24, 2021

hovaesco commented Dec 30, 2021

hovaesco commented Dec 31, 2021

bachng2017 commented Aug 18, 2022

hovaesco commented Aug 18, 2022

aakashnand commented Nov 21, 2021 •

edited

Loading

aakashnand commented Nov 22, 2021 •

edited

Loading

bachng2017 commented Nov 22, 2021 •

edited

Loading

bachng2017 commented Nov 23, 2021 •

edited

Loading

bachng2017 commented Nov 23, 2021 •

edited

Loading

hashhar commented Nov 23, 2021 •

edited

Loading