-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
borg2: borg {compact,create,..} fails with "Failed to release the lock" #8390
Comments
The main issue here seems to be this happening while it was doing
That other locking error is just something triggered by that, when it So, it HAD the lock (and regularly refreshed it while doing So, did you use Did your machine go to sleep while doing |
Nope and nope. Thinking about it, though, another possibility is that at that moment I had 2 instances of borg run. Though, definitely into different repos (one SFTP, the other local). Is that something I shouldn't do? From the error the lock appears to be repo specific and not global. |
borgs dealing with different repos don't influence each other (except maybe needed ressources, cpu, ram, i/o). with current borg2 a lot even works in parallel using the same repo. i'll have a look at the locking code later, the |
OK, it's at least not about the failing
|
Can you try if this fixes it?
I suspect if the backend is rather slow, the lock might time out while fetching LIST_SCAN_LIMIT (100000) items from |
Backup completed over night without any error incl. above proposed change. However performance unfortunately is devastating. |
Oh, that's slow. Maybe add the sftp related insights there: borgbackup/borgstore#44 Also add conn speed and the roundtrip time on that connection (ping) there. Default compression is lz4. Exhausting CPU sounds interesting, can you determine with what? |
You could also try to do a 64MiB sftp upload on that connection for comparison (not: ssh pipe, that works differently). |
First back to original issue - I hit it again, this time with the local file backend:
|
That's unexpected. But maybe update your local master branch, so that my recent repository.list fix gets in there. Can you say how long it ran before crashing within that borg invocation? |
2a20ebe does not seem to differ to what you proposed earlier in this ticket and what I applied (and still is applied) manually
real 67m17.522s |
11.9MB/s via |
#8396 should help debugging the locking issue, use |
Ongoing, what I also noticed when using the SFTP backend and now seeing with
That at least explains why locks could timeout when not refreshed within respective loop. |
@mirko that's when it queries all the chunk ids from remote. there is no cache for that yet. it distributes the objects using 2 levels of directories (e.g. 12/34/ contains all chunks that have an ID starting with 1234), so that is up to 65536+256+1 dirs. could be configured to only use 1 level of directories (that would be 256+1 dirs), but that would potentially put a lot of objects into a single directory, so i thought 2 levels are more appropriate for that kind of storage. |
Fair enough - it's still listing, though - and it already has been 2h+ when I wrote the last comment. Mind, we're talking about a repo on an SFTP backend, which for some reason appears to be really slow already (with paramiko and/or borg). So we're now at 6h+ and the backup didn't even start :) EDIT:
Now borg is doing something on 100% CPU for a while already without printing any (debug-)statement. EDIT2: Started backing up next VM into the same repo. Again, going through that |
Yeah, that is unbearable. Yes, it currently always ad-hoc builds the chunks index only in memory (after querying all object ids from repo), no local cache. I have some ideas how to optimize this, will do after finishing the files cache work. |
|
notable: ^^^ that may call repository.put() which refreshes the lock - except if the chunk is already in the repo.
hmm, a logging formatter with timestamps to the left of all log output would have been useful here. |
BTW, that looks a bit stupid (like it shooting itself in the foot by killing its own lock), but is actually useful to find locking issues, because it could have been shot by some other borg process also, due to the same root cause. |
|
maybe #8399 helps. |
Slightly off-topic - is this expected behaviour for the sftp backend?
Unfortunately I can't check anymore if |
The interaction of the Guess this is the reason for what you see. I only made it work "correctly", but it is not really efficient yet for large and/or slow stores and has quite some potential for improvements. |
BTW, I locally tested the lock refreshing by reducing the stale time in the |
Since I'm not sure how I exactly trigger(ed) it and passed time until it happened varied heavily, I was hesitant with updates - but I can tell that since I applied #8399 I also did not experience it again so far. |
OK, so I am closing this - reopen if you see it again. |
Here we are again - didn't update since then:
|
@mirko if that was after a longer time processing unchanged files (and thus not accessing the repo at all), it might be already fixed in b12. |
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
BUG / QUESTION
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg2 2.0.0b11.dev3+gc278a8e8.d20240915 with latest master/HEAD of borgstore
Full borg commandline that lead to the problem (leave away excludes and passwords)
Describe the problem you're observing.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
I scripted around borg (without checking exit codes) and had a typo in the
borg prune
line, causing it to fail due to typo in the passed args.Next command to be executed was
borg compact
which then surprisingly fails with error:NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).
Not sure what's going on here -
borg compact
failing to release a lock feels rather strange.Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: