Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CursorResourceManager.close #1440

Merged
merged 3 commits into from
Jul 12, 2024
Merged

Conversation

stIncMale
Copy link
Member

@stIncMale stIncMale commented Jul 9, 2024

This PR is an alternative to #1439.

The commit b7be8c8 introduces a draft reproducer of the bug reported in JAVA-5516. Then the following commit fixes it. The reproducer is not guaranteed to reproduce the error, but usually does so in just a few attempts on my machine. With the fix I was unable to reproduce the error.

The commit 5105484 introduces a reproducer that we can actually check into the codebase.

JAVA-5516

@stIncMale stIncMale self-assigned this Jul 9, 2024
@@ -182,7 +182,7 @@ void endOperation() {
void close() {
boolean doClose = withLock(lock, () -> {
State localState = state;
if (localState == State.OPERATION_IN_PROGRESS) {
if (localState.inProgress()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the original code:

If AsyncCommandBatchCursor.close is called concurrently more than once while AsyncCommandBatchCursor.next is running (not necessarily the next method itself, but the asynchronous logic of this method), then the following state transitions were possible:

  1. One of the close calls observes State.OPERATION_IN_PROGRESS, and changes it to State.CLOSE_PENDING. So far so good.
  2. The other close observes State.CLOSE_PENDING and changes it to State.CLOSED, which breaks one of the invariant that are supposed to be maintained by CursorResourceManager: while an operation is in progress, state.inProgress() must be true, i.e., state must be either OPERATION_IN_PROGRESS or CLOSE_PENDING (implies OPERATION_IN_PROGRESS).

Comment on lines 309 to 312
// Calling `close` twice is the key to reproducing.
// It does not matter whether we call `close` twice from the same thread or not.
ForkJoinPool.commonPool().execute(() -> cursor.close());
ForkJoinPool.commonPool().execute(() -> cursor.close());
Copy link
Member Author

@stIncMale stIncMale Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue this PR fixes is 100% a bug. And it manifests itself with an error that is very similar to the user-reported error (following is the error I reproduced locally):

java.lang.AssertionError: null
	at com.mongodb.assertions.Assertions.assertTrue(Assertions.java:158)
// note that with this reproducer, sometimes the assertion on line 224 succeeds, and the one on line 225 fails
	at com.mongodb.internal.operation.CursorResourceManager.setServerCursor(CursorResourceManager.java:224)
	at com.mongodb.internal.operation.AsyncCommandBatchCursor.lambda$getMoreLoop$2(AsyncCommandBatchCursor.java:186)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
	at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.lambda$executeAsync$0(DefaultServer.java:249)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
	at com.mongodb.internal.connection.CommandProtocolImpl.lambda$executeAsync$0(CommandProtocolImpl.java:88)
	at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.lambda$sendAndReceiveAsync$1(DefaultConnectionPool.java:774)
	at com.mongodb.internal.connection.UsageTrackingInternalConnection.lambda$sendAndReceiveAsync$1(UsageTrackingInternalConnection.java:150)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
	at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
	at com.mongodb.internal.async.SingleResultCallback.complete(SingleResultCallback.java:67)
	at com.mongodb.internal.async.AsyncSupplier.lambda$onErrorIf$5(AsyncSupplier.java:122)
	at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
	at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
	at com.mongodb.internal.connection.InternalStreamConnection.lambda$sendCommandMessageAsync$7(InternalStreamConnection.java:613)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:894)

However, none of this guarantees that the bug the PR fixes is the bug the user is facing, because theoretically, I can imagine other bugs leading to the same error. However, I was unable to find them in our code, so I am saying that the fix proposed in this PR is my current best solution to JAVA-5516.

Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

try {
assertDoesNotThrow(() -> {
cursorResourceManager.close();
cursorResourceManager.close();
Copy link
Member Author

@stIncMale stIncMale Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good example of close being executed concurrently with an operation being in progress, without any parallelism involved. Examples like this emphasize how concurrency and parallelism are different.

};
cursorResourceManager.tryStartOperation();
try {
assertDoesNotThrow(() -> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the fix in CursorResourceManager, this assertion fails.

@stIncMale stIncMale requested a review from rozza July 9, 2024 17:35
@stIncMale
Copy link
Member Author

stIncMale commented Jul 9, 2024

@rozza, I replaced the bug reproducer with a simpler one that is also deterministic. Please take a look.

Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - LGTM!

@stIncMale stIncMale merged commit 099ec1e into mongodb:master Jul 12, 2024
59 checks passed
@stIncMale stIncMale deleted the JAVA-5516 branch July 12, 2024 16:56
stIncMale added a commit to stIncMale/mongo-java-driver that referenced this pull request Jul 12, 2024
stIncMale added a commit to stIncMale/mongo-java-driver that referenced this pull request Jul 12, 2024
stIncMale added a commit that referenced this pull request Jul 12, 2024
stIncMale added a commit that referenced this pull request Jul 12, 2024
@thjaeckle
Copy link

Thanks for the fix.
Looking forward to get rid of the assertion errors 👍

stIncMale added a commit to stIncMale/mongo-java-driver that referenced this pull request Sep 4, 2024
stIncMale added a commit that referenced this pull request Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants