BUGFIX: Ensure users content stream is never left closed after publication #5342

mhsdesign · 2024-11-04T16:11:24Z

Followup to publishing version 3: #5301

There are 3 cases currently that might lead to a closed content stream if $something went wrong during publication.

Case 1: Ensure all events are published BEFORE catchup

Otherwise, due to failures in projection or catchup-hooks the process would be immediately interrupted leaving a broken state.

For example a faulty redirect handler hook - that just listens to live events - would be called during publishing. That means the remaining part to publish is already commited and we know we still have work to do to fork the new user content stream and apply the remaining. But the catchup hook would interrupt immediately when the events were catchup'd live. We would be left with a CLOSED user content stream that contains the "same" events that went live during the rebase. Reopening would not help at that point. This is why we must ensure that all events are published BEFORE we do the first catchup.

Further implications:

running catchup only once should be more performant
we cannot refetch the current content stream version for what where previously "subcommans" (forkContentStream) but we must pass $expectedVersions around from the outside
we should not run constraint checks after the first yield as that would still operate on the old state. Thus all checks are combined above
closing the content stream will not be catch up'd directly meaning that another process will not get an error because the content stream is not writeable but because due to the version in the database being behind it will get a concurrency exception

Case 2: Reopen content stream if base workspace was written to during publication

... and a ConcurrencyException is thrown

Introduces a WorkspacePublicationDuringWritingTest parallel test (with own cr) to assert that behaviour.

Case 3: Close content stream a bit later instead of having to reopen in unexpected errors

Previously an error in extractFromEventStream because the payload was not correct and yet has to be migrated would lead to a closed content stream which is of course persisted even after fixing the events via migration.

This is still save to do, as the closeContentStream will commit the close on the FIRSTly fetched expected version.

Same guarantees, different error behaviour in rare cases.

Upgrade instructions

Review instructions

Checklist

Code follows the PSR-2 coding style
Tests have been created, run and adjusted as needed
The PR is created against the lowest maintained branch
Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
Reviewer - The first section explains the change briefly for change-logs
Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

Otherwise, due to failures in projection or catchup-hooks the process would be immediately interrupted leaving a broken state. For example a faulty redirect handler hook - that just listens to live events - would be called during publishing. That means the remaining part to publish is already commited and we know we still have work to do to fork the new user content stream and apply the remaining. But the catchup hook would interrupt immediately when the events were catchup'd live. We would be left with a CLOSED user content stream that contains the "same" events that went live during the rebase. Reopening would not help at that point. This is why we must ensure that all events are published BEFORE we do the first catchup. Further implications: - running catchup only once should be more performant - we cannot refetch the current content stream version for what where previously "subcommans" (`forkContentStream`) but we must pass $expectedVersions around from the outside - we should not run constraint checks after the first `yield` as that would still operate on the old state. Thus all checks are combined above

bwaidelich

I think it's a great start and the LoC balance is negative even:

Neos.ContentRepository.Core/Classes/CommandHandler/CommandBus.php

Neos.ContentRepository.Core/Classes/CommandHandler/CommandHandlerInterface.php

Neos.ContentRepository.Core/Classes/EventStore/EventPersister.php

Neos.ContentRepository.Core/Classes/Feature/ContentStreamHandling.php

Neos.ContentRepository.Core/Classes/ContentRepository.php

…re-contentstream-not-closed

…ring publication ... and a ConcurrencyException is thrown Introduces a `WorkspacePublicationDuringWritingTest` parallel test (with own cr) to assert that behaviour.

That allows us to use the same content repository. Previously a super slow paratest would lead that another testcase will already be started and its setup then run twice at the end. paratestphp/paratest#905

… version instead Also readd lost documentation and simplifies the `handle` The ->throw logic was initially introduced via neos#5315 but then removed again as we thought it was no longer needed.

…re content stream is never left closed During the beta phase it can happen that user forget to apply a migration to migrate the stored commands in the even metadata, upon publish this would close the content stream and fail directly afterward. Applying the migration then would not be enough as the content stream is a closed state and has to be repaired manually. Event thought this is not super likely, its not unlikely as well and the case during publication were we rely on things that might not be that way. As an alternative we could discuss doing the closing after acquiring the rebaseable commands.

… many edge cases Alternative fix for d27f83f Previously an error in `extractFromEventStream` because the payload was not correct and yet has to be migrated would lead to a closed content stream which is of course persisted even after fixing the events via migration. This is still save to do, as the `closeContentStream` will commit the close on the FIRSTly fetched expected version. Same guarantees, different error behaviour in rare cases.

;)

bwaidelich

Some, probably semi-helpful, comments. Looks great otherwise

.composer.json

Neos.ContentRepository.BehavioralTests/Tests/Parallel/AbstractParallelTestCase.php

Neos.Neos/Classes/Domain/Model/UserId.php

Neos.ContentRepository.Core/Classes/ContentRepository.php

Neos.ContentRepository.Core/Classes/EventStore/EventPersister.php

Neos.ContentRepository.Core/Classes/Feature/Common/ConstraintChecks.php

…ace is still the original content stream

bwaidelich

+1 by reading and running the tests

…re-contentstream-not-closed

dlubitz

Look good by reading. But as a disclaimer ... I'm not that deep into this topic to fully understand. 🤷‍♂️

…have :))

github-actions bot added Bug 9.0 labels Nov 4, 2024

bwaidelich reviewed Nov 4, 2024

View reviewed changes

kitsunet reviewed Nov 4, 2024

View reviewed changes

Neos.ContentRepository.Core/Classes/ContentRepository.php Outdated Show resolved Hide resolved

mhsdesign added 11 commits November 9, 2024 11:42

Merge remote-tracking branch 'origin/9.0' into bugfix/publishing-ensu…

6fc43ab

…re-contentstream-not-closed

BUGFIX: Fix reopen content stream if base workspace was written to du…

4352b2e

…ring publication ... and a ConcurrencyException is thrown Introduces a `WorkspacePublicationDuringWritingTest` parallel test (with own cr) to assert that behaviour.

TASK: Fix parallel tests by ensuring only one is run at time

18072f6

That allows us to use the same content repository. Previously a super slow paratest would lead that another testcase will already be started and its setup then run twice at the end. paratestphp/paratest#905

TASK: Simplify code and remove reopen cs logic into publishWorkspace

9a6127b

TASK: Only fetch content stream once for constraint checks

cb34618

TASK: Adjust .composer json

0273e32

TASK: Do not send $commitResult to generator but calculate expected…

77778f9

… version instead Also readd lost documentation and simplifies the `handle` The ->throw logic was initially introduced via neos#5315 but then removed again as we thought it was no longer needed.

TASK: Inline now simplified YieldedEventsToPublish virtual type again

7b922bf

TASK: Add proper docs to EventPersister

d290047

;)

mhsdesign marked this pull request as ready for review November 9, 2024 20:20

TASK: Adjust naming of removeContentStreamWithoutConstraintChecks

8e48e7e

;)

mhsdesign requested review from kitsunet and bwaidelich November 9, 2024 20:28

mhsdesign changed the title ~~BUGFIX: Ensure all events are published BEFORE catchup~~ BUGFIX: Ensure users content stream is never left closed after publication Nov 9, 2024

bwaidelich reviewed Nov 10, 2024

View reviewed changes

mhsdesign added 3 commits November 10, 2024 09:11

TASK: Improve assertions of WorkspacePublicationDuringWritingTest

59fa2e3

TASK: Assert that in WorkspaceWritingDuringRebaseTest that the worksp…

48e09cb

…ace is still the original content stream

TASK: Naming things and suggestion from code review :)

e12c641

mhsdesign requested a review from bwaidelich November 10, 2024 08:30

bwaidelich approved these changes Nov 12, 2024

View reviewed changes

Merge remote-tracking branch 'origin/9.0' into bugfix/publishing-ensu…

b857367

…re-contentstream-not-closed

dlubitz reviewed Nov 12, 2024

View reviewed changes

TASK: Fix tests after bastis command test overhaul

f883e65

TASK: Recorrect naming of method again (my code editor seems to misbe…

dfd4573

…have :))

kitsunet approved these changes Nov 12, 2024

View reviewed changes

kitsunet merged commit 80d8750 into neos:9.0 Nov 12, 2024
9 checks passed

mhsdesign deleted the bugfix/publishing-ensure-contentstream-not-closed branch November 12, 2024 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUGFIX: Ensure users content stream is never left closed after publication #5342

BUGFIX: Ensure users content stream is never left closed after publication #5342

mhsdesign commented Nov 4, 2024 •

edited

Loading

bwaidelich left a comment

bwaidelich left a comment

bwaidelich left a comment

dlubitz left a comment

BUGFIX: Ensure users content stream is never left closed after publication #5342

BUGFIX: Ensure users content stream is never left closed after publication #5342

Conversation

mhsdesign commented Nov 4, 2024 • edited Loading

Case 1: Ensure all events are published BEFORE catchup

Case 2: Reopen content stream if base workspace was written to during publication

Case 3: Close content stream a bit later instead of having to reopen in unexpected errors

bwaidelich left a comment

Choose a reason for hiding this comment

bwaidelich left a comment

Choose a reason for hiding this comment

bwaidelich left a comment

Choose a reason for hiding this comment

dlubitz left a comment

Choose a reason for hiding this comment

mhsdesign commented Nov 4, 2024 •

edited

Loading