-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CursorResourceManager.close
#1440
Conversation
@@ -182,7 +182,7 @@ void endOperation() { | |||
void close() { | |||
boolean doClose = withLock(lock, () -> { | |||
State localState = state; | |||
if (localState == State.OPERATION_IN_PROGRESS) { | |||
if (localState.inProgress()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the original code:
If AsyncCommandBatchCursor.close
is called concurrently more than once while AsyncCommandBatchCursor.next
is running (not necessarily the next
method itself, but the asynchronous logic of this method), then the following state transitions were possible:
- One of the
close
calls observesState.OPERATION_IN_PROGRESS
, and changes it toState.CLOSE_PENDING
. So far so good. - The other
close
observesState.CLOSE_PENDING
and changes it toState.CLOSED
, which breaks one of the invariant that are supposed to be maintained byCursorResourceManager
: while an operation is in progress,state.inProgress()
must betrue
, i.e.,state
must be eitherOPERATION_IN_PROGRESS
orCLOSE_PENDING
(impliesOPERATION_IN_PROGRESS
).
// Calling `close` twice is the key to reproducing. | ||
// It does not matter whether we call `close` twice from the same thread or not. | ||
ForkJoinPool.commonPool().execute(() -> cursor.close()); | ||
ForkJoinPool.commonPool().execute(() -> cursor.close()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue this PR fixes is 100% a bug. And it manifests itself with an error that is very similar to the user-reported error (following is the error I reproduced locally):
java.lang.AssertionError: null
at com.mongodb.assertions.Assertions.assertTrue(Assertions.java:158)
// note that with this reproducer, sometimes the assertion on line 224 succeeds, and the one on line 225 fails
at com.mongodb.internal.operation.CursorResourceManager.setServerCursor(CursorResourceManager.java:224)
at com.mongodb.internal.operation.AsyncCommandBatchCursor.lambda$getMoreLoop$2(AsyncCommandBatchCursor.java:186)
at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.lambda$executeAsync$0(DefaultServer.java:249)
at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
at com.mongodb.internal.connection.CommandProtocolImpl.lambda$executeAsync$0(CommandProtocolImpl.java:88)
at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.lambda$sendAndReceiveAsync$1(DefaultConnectionPool.java:774)
at com.mongodb.internal.connection.UsageTrackingInternalConnection.lambda$sendAndReceiveAsync$1(UsageTrackingInternalConnection.java:150)
at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:47)
at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
at com.mongodb.internal.async.SingleResultCallback.complete(SingleResultCallback.java:67)
at com.mongodb.internal.async.AsyncSupplier.lambda$onErrorIf$5(AsyncSupplier.java:122)
at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
at com.mongodb.internal.async.AsyncSupplier.lambda$finish$0(AsyncSupplier.java:65)
at com.mongodb.internal.connection.InternalStreamConnection.lambda$sendCommandMessageAsync$7(InternalStreamConnection.java:613)
at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:894)
However, none of this guarantees that the bug the PR fixes is the bug the user is facing, because theoretically, I can imagine other bugs leading to the same error. However, I was unable to find them in our code, so I am saying that the fix proposed in this PR is my current best solution to JAVA-5516.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
try { | ||
assertDoesNotThrow(() -> { | ||
cursorResourceManager.close(); | ||
cursorResourceManager.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good example of close
being executed concurrently with an operation being in progress, without any parallelism involved. Examples like this emphasize how concurrency and parallelism are different.
}; | ||
cursorResourceManager.tryStartOperation(); | ||
try { | ||
assertDoesNotThrow(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the fix in CursorResourceManager
, this assertion fails.
@rozza, I replaced the bug reproducer with a simpler one that is also deterministic. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - LGTM!
JAVA-5516
JAVA-5516
Thanks for the fix. |
mongodb#1440 to 4.11.x JAVA-5516
This PR is an alternative to #1439.
The commit b7be8c8 introduces a draft reproducer of the bug reported in JAVA-5516. Then the following commit fixes it. The reproducer is not guaranteed to reproduce the error, but usually does so in just a few attempts on my machine. With the fix I was unable to reproduce the error.
The commit 5105484 introduces a reproducer that we can actually check into the codebase.
JAVA-5516