Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: fix readonly branch commit errors in Oracle XA transactions. #6629

Closed
wants to merge 0 commits into from

Conversation

gongycn
Copy link
Contributor

@gongycn gongycn commented Jun 24, 2024

  • I have registered the PR changes.

Ⅰ. Describe what this PR did

fix the issue of readonly branch commit errors in Oracle XA transactions, which leads to long-running transactions being unable to commit and causing unreadable states.
Solution Logic:
When the database is Oracle and the prepare result is XA_RDONLY, notify TC that the current branch status is BranchStatus.PhaseOne_RDONLY (a newly added state to address this issue). During global commit, for branches with this status type, delete the branch log directly and ignore it.

Ⅱ. Does this pull request fix one issue?

fixes #6512

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Based on the problematic local project instance, the verification includes:

Whether the Seata TC frontend displays normally;
In Seata TM managing multiple RMs, whether it functions correctly when there is one or more readonly branches; whether it functions correctly when there are no readonly branches; and whether it functions correctly when there are only one or more readonly branches.

Ⅴ. Special notes for reviews

Copy link
Contributor

@funky-eyes funky-eyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作?
Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

@gongycn
Copy link
Contributor Author

gongycn commented Jun 24, 2024

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作? Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

Simple branch logic composed of basic SQL can be implemented this way, but consideration is needed if the transaction involves more complex scenarios such as calling stored procedures under XA branch transactions. The business logic within stored procedures may not be straightforward to determine.

Moreover, in Oracle, the prepare phase can explicitly indicate whether the branch is read-only, so it may feel unnecessary to handle it in this way.

@gongycn gongycn requested a review from funky-eyes June 24, 2024 05:35
@funky-eyes
Copy link
Contributor

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作? Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

Simple branch logic composed of basic SQL can be implemented this way, but consideration is needed if the transaction involves more complex scenarios such as calling stored procedures under XA branch transactions. The business logic within stored procedures may not be straightforward to determine.

Moreover, in Oracle, the prepare phase can explicitly indicate whether the branch is read-only, so it may feel unnecessary to handle it in this way.

We currently refuse to use stored procedures directly, so I would prefer to place the XA START after the first DML action is recognized.

@gongycn
Copy link
Contributor Author

gongycn commented Jun 24, 2024

我们是否可以将sql进行解析,如果这个sql是select类型的sql,就不进行xa相关的事务动作? Can we parse the SQL statement to skip XA-related transaction actions if the SQL is of type SELECT?

Simple branch logic composed of basic SQL can be implemented this way, but consideration is needed if the transaction involves more complex scenarios such as calling stored procedures under XA branch transactions. The business logic within stored procedures may not be straightforward to determine.
Moreover, in Oracle, the prepare phase can explicitly indicate whether the branch is read-only, so it may feel unnecessary to handle it in this way.

We currently refuse to use stored procedures directly, so I would prefer to place the XA START after the first DML action is recognized.

  1. The official documentation mentions for XA use cases: Suitable for migrating old applications to the Seata platform based on the XA protocol.. Therefore, if stored procedures are not supported, it will not be possible to achieve this. Currently, some of our applications are using a mix of XA and TCC.
  2. If the XA transaction is started only after the first DML operation, in Oracle, this can be achieved by leveraging the Promotable XA feature to ensure that the repeatable read characteristic is maintained under the Repeatable Read isolation level. Otherwise, the repeatable read cannot be guaranteed. This feature is not necessarily supported in other databases.

@funky-eyes
Copy link
Contributor

  1. I don't understand the relevance between supporting stored procedures and migrating to Seata XA mode that you mentioned.
  2. I believe Oracle commonly uses the RC isolation level, so it seems there's no need to use the feature you mentioned.

@gongycn
Copy link
Contributor Author

gongycn commented Jun 24, 2024

  1. I don't understand the relevance between supporting stored procedures and migrating to Seata XA mode that you mentioned.
  2. I believe Oracle commonly uses the RC isolation level, so it seems there's no need to use the feature you mentioned.
  1. Currently, we have some legacy services that process business logic through stored procedures and we want to integrate them with Seata transactions.
  2. We are using the RR (Repeatable Read) isolation level, and since Seata is a general framework, XA as a two-phase commit protocol should support various scenarios.
  3. The basic logical template structure of XA transactions is consistent, with only minor implementation differences across different databases. However, all of them can handle the basic logical template well. For the current issue, the desired result can be achieved during the prepare phase, and appropriate processing can be performed. There is no need to introduce other processing logic outside of the XA process.

Thanks for your question, which has led me to further analyze this solution in depth.

@gongycn
Copy link
Contributor Author

gongycn commented Jun 25, 2024

  1. The current issue of read-only transaction branch commit errors can be efficiently and elegantly resolved within the internal handling logic of XA transactions.
  2. Since SQL parsing cannot adapt to all database syntaxes, introducing SQL parsing to solve this issue would undoubtedly reduce the generality of XA transactions and increase performance overhead due to SQL parsing.

Based on the above points, it is not recommended to introduce SQL parsing to address this issue.

@funky-eyes
Copy link
Contributor

I understand your point, but I believe that in a read-only transaction, why isn't it considered a local transaction instead of being included in a global transaction? If a global transaction includes a read-only local transaction, it will inevitably affect throughput due to this additional branch. However, if it is just a local transaction, there will be significantly fewer interactions with the DB and the TC. Therefore, I believe that the overhead caused by SQL parsing is much lower than the overhead of starting an XA transaction and registering an XA branch to the TC. Moreover, we only need to identify whether the SQL is a SELECT statement rather than other DML statements. This has little to do with the number or variety of SQL types supported by SQL parsing.

@gongycn
Copy link
Contributor Author

gongycn commented Jun 25, 2024

I understand your point, but I believe that in a read-only transaction, why isn't it considered a local transaction instead of being included in a global transaction? If a global transaction includes a read-only local transaction, it will inevitably affect throughput due to this additional branch. However, if it is just a local transaction, there will be significantly fewer interactions with the DB and the TC. Therefore, I believe that the overhead caused by SQL parsing is much lower than the overhead of starting an XA transaction and registering an XA branch to the TC. Moreover, we only need to identify whether the SQL is a SELECT statement rather than other DML statements. This has little to do with the number or variety of SQL types supported by SQL parsing.

The branch transactions in this scenario are typically executed based on the business state, which determines whether data insertions, deletions, or updates are necessary. Different business data states lead to inconsistencies in this processing logic—sometimes it may be read-only, sometimes not.

While it can be seen as an imperfect organization and implementation of business logic, due to resource constraints, it is often difficult to handle every detail thoroughly.

The solution I am currently proposing addresses this issue by adding an additional interaction with the Transaction Coordinator (TC) when such a situation arises, to avoid errors in the commit of read-only branch transactions. This can be understood as a fault tolerance mechanism. Of course, in a well-organized and clear business logic process, as you mentioned, such situations would not occur, and thus this extra interaction with the TC would not be necessary.

@gongycn
Copy link
Contributor Author

gongycn commented Jun 25, 2024

Of course, the proposed solution can later be combined with the SQL parsing logic you mentioned. If the SQL parsing perfectly handles all operations that are read-only and avoids initiating XA-related transaction actions, it would also prevent this extra interaction with the TC. However, if there are any omissions, this approach can improve robustness.

Furthermore, this proposed solution can effectively and immediately resolve the current issue with Oracle XA branch transactions. It can prevent the problem where, after the global transaction is committed, other XA branches' data remain unreadable for an extended period due to the failure of committing read-only transaction branches.

@funky-eyes
Copy link
Contributor

  1. Regarding read-only branches in relation to SQL parsing recognition, there are additional actions such as XA end and XA prepare, all of which involve I/O interactions with the database. Then there's the step of reporting the branch status to the transaction coordinator (TC), which adds some extra overhead.

  2. I'm not quite clear on what you're saying. Why would a failed commit of an XA read-only branch affect other transactions' ability to read data? It shouldn't have acquired any data locks; a snapshot read shouldn't hold any exclusive database locks.

  3. I understand the intention behind your pull request, which indeed optimizes behavior under Oracle's XA read-only transactions. However, I would prefer a more universal approach applicable to all compatible databases, aiming for standardized handling. Therefore, I propose that after SQL parsing, we should skip initiating XA transaction-related actions for non-DML operations. We might need input from other community members on this point. @slievrly @PeppaO

@gongycn
Copy link
Contributor Author

gongycn commented Jun 25, 2024

  1. Regarding read-only branches in relation to SQL parsing recognition, there are additional actions such as XA end and XA prepare, all of which involve I/O interactions with the database. Then there's the step of reporting the branch status to the transaction coordinator (TC), which adds some extra overhead.
  2. I'm not quite clear on what you're saying. Why would a failed commit of an XA read-only branch affect other transactions' ability to read data? It shouldn't have acquired any data locks; a snapshot read shouldn't hold any exclusive database locks.
  3. I understand the intention behind your pull request, which indeed optimizes behavior under Oracle's XA read-only transactions. However, I would prefer a more universal approach applicable to all compatible databases, aiming for standardized handling. Therefore, I propose that after SQL parsing, we should skip initiating XA transaction-related actions for non-DML operations. We might need input from other community members on this point. @slievrly @PeppaO

The issue in point 2 is that when an XA transaction commits globally, the multiple branch transactions in the TC are committed synchronously, one after the other. If any of these commits fail, the commit logic exits, causing the global commit to get stuck on that failed branch, continuously retrying. This prevents subsequent RM branches from actually receiving the TC's commit request, making the transactions of these RM branches invisible to other transactions.

@funky-eyes
Copy link
Contributor

  1. Regarding read-only branches in relation to SQL parsing recognition, there are additional actions such as XA end and XA prepare, all of which involve I/O interactions with the database. Then there's the step of reporting the branch status to the transaction coordinator (TC), which adds some extra overhead.
  2. I'm not quite clear on what you're saying. Why would a failed commit of an XA read-only branch affect other transactions' ability to read data? It shouldn't have acquired any data locks; a snapshot read shouldn't hold any exclusive database locks.
  3. I understand the intention behind your pull request, which indeed optimizes behavior under Oracle's XA read-only transactions. However, I would prefer a more universal approach applicable to all compatible databases, aiming for standardized handling. Therefore, I propose that after SQL parsing, we should skip initiating XA transaction-related actions for non-DML operations. We might need input from other community members on this point. @slievrly @PeppaO

The issue in point 2 is that when an XA transaction commits globally, the multiple branch transactions in the TC are committed synchronously, one after the other. If any of these commits fail, the commit logic exits, causing the global commit to get stuck on that failed branch, continuously retrying. This prevents subsequent RM branches from actually receiving the TC's commit request, making the transactions of these RM branches invisible to other transactions.

I understand. So that's why you've implemented the skip operation for this branch. I see this PR as a temporary solution, but ultimately, we should move towards a unified processing model

@gongycn
Copy link
Contributor Author

gongycn commented Jun 25, 2024

Of course, the proposed solution can later be combined with the SQL parsing logic you mentioned. If the SQL parsing perfectly handles all operations that are read-only and avoids initiating XA-related transaction actions, it would also prevent this extra interaction with the TC. However, if there are any omissions, this approach can improve robustness.

I understand. So that's why you've implemented the skip operation for this branch. I see this PR as a temporary solution, but ultimately, we should move towards a unified processing model

Yes, this PR can be seen as a temporary solution, or as mentioned above, it can serve as a fallback solution to ensure robustness.

@gongycn
Copy link
Contributor Author

gongycn commented Jun 26, 2024

One of the reasons for using XA transactions in Seata is that XA transactions allow developers to:

  1. Fully utilize the database's unique features: For example, special SQL syntax, stored procedures, functions, etc.
  2. Standardized interface: XA transactions are implemented based on a standard protocol with a first phase (start, execute, end, prepare) and a second phase (commit/rollback).

I believe that the implementation of XA transactions in Seata should maintain these basic characteristics of XA transactions. From this perspective, the approach in this PR:

  1. Does not compromise the support for the database's unique features.
  2. Does not exceed the requirements of the XA standard protocol.
  3. Indeed solves the issue of read-only branch commit errors.

Therefore, I hope this PR can be accepted.

@funky-eyes funky-eyes added this to the 2.2.0 milestone Jun 26, 2024
@funky-eyes funky-eyes added type: bug Category issues or prs related to bug. DB: Oracle Relate to seata Oracle module/rm-datasource rm-datasource module first-time contributor first-time contributor labels Jun 26, 2024
@gongycn
Copy link
Contributor Author

gongycn commented Jun 29, 2024

@funky-eyes fix the incorrect code style: '{' is not preceded with whitespace.

@funky-eyes
Copy link
Contributor

@funky-eyes fix the incorrect code style: '{' is not preceded with whitespace.

OK, thank you for submitting this PR. However, due to the discussions mentioned above, I need to consult with other committers and PPMC members in the community. Please be patient while I do so.

Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 30.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 50.82%. Comparing base (e8ea2cc) to head (ea14b2d).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##                2.x    #6629      +/-   ##
============================================
- Coverage     50.84%   50.82%   -0.02%     
+ Complexity     5831     5828       -3     
============================================
  Files          1051     1051              
  Lines         36325    36333       +8     
  Branches       4317     4322       +5     
============================================
- Hits          18468    18465       -3     
- Misses        16003    16012       +9     
- Partials       1854     1856       +2     
Files Coverage Δ
...java/org/apache/seata/core/model/BranchStatus.java 100.00% <100.00%> (ø)
...ache/seata/rm/datasource/xa/ConnectionProxyXA.java 55.35% <25.00%> (-1.01%) ⬇️
...g/apache/seata/server/coordinator/DefaultCore.java 44.31% <0.00%> (-1.04%) ⬇️

... and 2 files with indirect coverage changes

@gongycn
Copy link
Contributor Author

gongycn commented Jul 29, 2024

@funky-eyes I tested four types of databases: MySQL (8), Oracle (12c), Postgres (16), and MSSQL Server (2022). Only Oracle has read-only optimization; the others do not provide read-only feedback. Based on these four databases, the database type check can be eliminated here.

@funky-eyes
Copy link
Contributor

This is awesome, but why are you closing this PR?

@gongycn
Copy link
Contributor Author

gongycn commented Jul 29, 2024

in

This is awesome, but why are you closing this PR?

@funky-eyes Misoperation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB: Oracle Relate to seata Oracle first-time contributor first-time contributor module/rm-datasource rm-datasource module type: bug Category issues or prs related to bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XA事务模式下写入的查询不到问题探讨
2 participants