Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore the portal-level snapshot after procedure COMMIT/ROLLBACK. #513

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

Howard229
Copy link

@Howard229 Howard229 commented Jul 7, 2024

Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.

COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal. In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve. So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK. As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands. Hence, teach
spi.c about doing this at the right time. (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix. But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht. Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/[email protected]

cherry-pick ef94805096229ee3573624465a76ca11d2bd8529 from
PostgreSQL's REL_11_STANLE branch,and resolved some code conflicts。

image

@polardb-bot
Copy link

polardb-bot bot commented Jul 7, 2024

Hi @Howard229 ~ Thanks for your contribution in this PR. ❤️

Please make sure that your PR conforms the standard, and has passed all the checks.

We will review your PR as soon as possible.

@Howard229
Copy link
Author

#510
修改后,原测试用例正常执行未报错,结果符合预期
image

@polardb-bot
Copy link

polardb-bot bot commented Jul 7, 2024

Hey @Howard229 :

Something wrong occuried during the checks of your commit 😟, please check the detail:

⚠️ build-and-publish-rpm (rocky9) View more details

@polardb-bot polardb-bot bot added the ci/failure CI status is failure label Jul 7, 2024
@polardb-bot
Copy link

polardb-bot bot commented Jul 8, 2024

Hey @Howard229 :

Congratulations~ 🎉 Your commit has passed all the checks. Please wait for further manual review.

@polardb-bot polardb-bot bot added ci/success CI status is success and removed ci/failure CI status is failure labels Jul 8, 2024
@mrdrivingduck
Copy link
Member

@Howard229 hi, I find this commit is actually a community commit from PostgreSQL. Can you cherry-pick this commit to keep the original author of this patch? Also, you should pick the one which resides on PostgreSQL's REL_11_STABLE branch.

COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/[email protected]
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Howard229
Copy link
Author

Howard229 commented Jul 18, 2024

@Howard229 hi, I find this commit is actually a community commit from PostgreSQL. Can you cherry-pick this commit to keep the original author of this patch? Also, you should pick the one which resides on PostgreSQL's REL_11_STABLE branch.

As you mentioned, I have retained the original author in this commit, but does the original author need to sign the CLA (Contributor License Agreement)? So, what should I do next?

@mrdrivingduck
Copy link
Member

As you said, I have retained the original author in this submission, but do I need the original author to sign a CLA? So what should I do next?

No need, forget about it. Let's wait until all regression passed.

@polardb-bot
Copy link

polardb-bot bot commented Jul 18, 2024

Hey @Howard229 :

Congratulations~ 🎉 Your commit has passed all the checks. Please wait for further manual review.

@polardb-bot polardb-bot bot added ci/success CI status is success and removed ci/success CI status is success labels Jul 18, 2024
@Howard229
Copy link
Author

As you said, I have retained the original author in this submission, but do I need the original author to sign a CLA? So what should I do next?

No need, forget about it. Let's wait until all regression passed.

All regressions have been passed except for license/CLA

@mrdrivingduck mrdrivingduck merged commit 3ebc3d9 into ApsaraDB:POLARDB_11_DEV Jul 18, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/success CI status is success
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants