Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh should not acquire readLock #48414

Merged
merged 4 commits into from
Oct 25, 2019
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 23, 2019

Today, we hold the engine readLock while refreshing. Although this choice simplifies the correctness reasoning, it can block IndexShard from closing if warming an external reader takes time. The current implementation of refresh does not need to hold readLock as ReferenceManager can handle errors correctly if the engine is closed in midway.

This PR is a prerequisite that we need to solve #47186.

Relates #47186

@dnhatn dnhatn added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.5.0 v7.6.0 labels Oct 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@dnhatn
Copy link
Member Author

dnhatn commented Oct 23, 2019

A known issue tracked at #46021.

@elasticmachine run elasticsearch-ci/1

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but not sure that counts for this issue. I did try to find a hole in the plot (and were unable to), but the other reviewers have the necessary historic background to do a qualified review.

Maybe also add a test to demonstrate that we do not hold the lock?

@dnhatn
Copy link
Member Author

dnhatn commented Oct 24, 2019

Maybe also add a test to demonstrate that we do not hold the lock?

+1. I added that test in 37e1558.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you hold off a bit (e.g. a week) to merge this into 7.5 branch (fine to merge to other branches). I would like to give this enough CI testing before it goes into our product.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 25, 2019

@henningandersen @jpountz @ywelsch Thanks for reviewing.

@dnhatn dnhatn merged commit 379e847 into elastic:master Oct 25, 2019
@dnhatn dnhatn deleted the refresh-without-lock branch October 25, 2019 21:31
dnhatn added a commit that referenced this pull request Oct 25, 2019
Today, we hold the engine readLock while refreshing. Although this 
choice simplifies the correctness reasoning, it can block IndexShard 
from closing if warming an external reader takes time. The current
implementation of refresh does not need to hold readLock as
ReferenceManager can handle errors correctly if the engine is closed in
midway.

This PR is a prerequisite that we need to solve #47186.
@dnhatn
Copy link
Member Author

dnhatn commented Oct 26, 2019

Can you hold off a bit (e.g. a week) to merge this into 7.5 branch (fine to merge to other branches).

+1

dnhatn added a commit that referenced this pull request Nov 3, 2019
Today, we hold the engine readLock while refreshing. Although this 
choice simplifies the correctness reasoning, it can block IndexShard 
from closing if warming an external reader takes time. The current
implementation of refresh does not need to hold readLock as
ReferenceManager can handle errors correctly if the engine is closed in
midway.

This PR is a prerequisite that we need to solve #47186.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.5.0 v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants