-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry file locks after delay in case of failure #28544
Conversation
|
If we agree with the extra settings I'll add them to config.sample.php |
* @param int $retries number of retries before giving up | ||
* @param int $retryDelay delay to wait between retries, in milliseconds | ||
*/ | ||
public function __construct($provider, $retries = 5, $retryDelay = 1000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hinting for the provider
} | ||
|
||
/** | ||
* {@inheritdoc} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
I'm not sure if it's worthy to parametrize the methods that will be retried in the |
I just wanted to avoid duplicating the code of the while loop logic. |
fixed the indents. I've also added the new params in config.sample.php |
Ok, code looks good 👍 |
WTF, I tried to test this and now it seems both processes get a lock... Here is how I did it:
Sometimes one of the processes gets 423 Locked, which is fine. However, too often, I see that both processes get 423 Locked... Maybe I need to randomize the wait time ? |
Happens here:
Both processes are trying to get an exclusive lock to finish writing the file (the rename from part file to final file). But the database shows that there is already an exclusive lock there:
The file in question is "files/9688085ed76345fbabcce6dc3027772c". It could be related to the fact that we first set shared locks, then exclusive lock. Needs further research. |
Maybe the DB provider doesn't use atomic operations to implement the locks. If the operations aren't atomic, race conditions could happen. |
Next up: try with redis to see whether it's the DB causing trouble |
Looks like setting the delay to 5 seconds (5000) helps to solve problems like #28779. So it does work. |
Still doesn't explain why my previous test cases had both processes get a lock and exclude each other. Probably due to both setting a shared lock on it first before going to exclusive lock, and of course when retrying none of both will remove the shared lock first. |
rebased and increased default delay value to 5000 |
Turns out that the new DAV endpoint is locking too much and parallel requests will cause lock outs: #28779 This might explain what I observed here when testing. Best would be to retest with the old dav endpoint to validate this PR. |
First test with old dav endpoint, same issue: both calls lock each other out. Back to the drawing board then... |
Considering that this was not reproducible with Redis, I wonder if Redis already has a waiting system internally. If yes we'd only need the above logic for DB locking. But it would also mean that advising people to use Redis is the better way. |
abandoning this. feel free to take over |
Description
Adds a decorator to retry acquiring file locks after a delay in case of failure.
Related Issue
Fixes #17016
Motivation and Context
See issue.
Basically some cron tasks or concurrent tasks might prevent an expensive upload to finish due to a lock. Instead of abandoning directly, the code tries again shortly after to give a few more chances to finish the transaction and avoid having to redo it.
In my personal case, it often happens that I upload files with Android around at a round quarter of an hour which conflicts with cron run, and often times the upload fails with locking issues so I need to retry manually.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: