-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup: async backup-compact #296
backup: async backup-compact #296
Conversation
7e06d43
to
51b1dc9
Compare
8af90fc
to
61409c8
Compare
548dd09 fixes pip error: after 548dd09 and
The failing integration tests (not |
Lines 66 to 70 in 2292372
Initially I thought maybe edit: |
@cdecker I think this is ready for review/merge (if desired). The failing integration tests is not caused by this PR, see above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 548dd09
Just two very minor things about comparing ints and Nones and a clarification for my benefit.
On the design: this makes both the compaction as well as the RPC result asynchronous, i.e., the RPC command returns useful information only in very few cases, i.e., when the compaction is not needed, otherwise it just states "doing something". Wouldn't it be better to make the response synchronous, and report size in bytes and number of changes before and after, which should be more informative. Keeping the request pending even for long times is practically free and we wouldn't have to poll.
@Mergifyio rebase |
548dd09
to
6577bf0
Compare
✅ Branch has been successfully rebased |
6577bf0
to
df1c96a
Compare
Rebased on master, not sure what the mergify rebase did. Only added two commits, to address type safety and make it easier to review (I hope!).
I looked into that, but it probably requires yet another thread (to keep handling
Is there maybe an example of this with
But above suggestions are interesting to explore, however lets first see how this performs in comparison to other backup methods. Any idea how broadly |
No worries mergify just took your changes and rebased them on the current
Definitely good to review. Personally I go with fixups, but our review bot helps here: https://bot.bitcoinstats.com/lightningd/plugins/pull/296/range_diff/548dd09a2b00589a8921f3b9b134d49ddc5b870b..df1c96ade1709e31d371a5be4b4579293f656c65
So you are already creating a new thread here: Making the response asynchronous would just mean that we pass the
One example could be the Lines 152 to 160 in 6f2b8fb
It shows both pseudo-sync flow and async flow: if we already have a message ready to return when we get called we set the result immediately, effectively emulating a Line 158 in 6f2b8fb
If on the other hand we don't yet have a message to return, we need to defer returning the result, and free the main thread so we can get hook calls, notification etc, which are used to fill the message queue. We do this by simply remembering that there is a pending request in a list of subscribers. Line 160 in 6f2b8fb
When we later get an Lines 205 to 212 in 6f2b8fb
This is with just a single thread, because we either return immediately, or we stash the requests away, waiting for a hook call. In your case we'd simply pass the Any chance we could keep the |
Thanks a lot for the explanation.
For a local plugins/backup/socketbackend.py Lines 215 to 219 in b65fbc5
Yes, just lock the backup file when writing data, or when reading metadata. But all above requires
Agreed, but who needs this info? As a user (myself) I just want my backup not fill up my usb or ssd drive, which got to 7GB after a few weeks running clboss. Maybe the backup-compact method can be removed entirely and instead add an auto-compact option with a size parameter. So that |
Good point, since the backend itself is controlling storage medium access we need to have it also drive the async compaction, I hadn't thought about it. So the
That makes the compaction synchronous again, i.e., the
Though maybe the compaction must be split into two steps: Anyway, I like discussing these more architectural things, but happy to merge as is to unblock others waiting on this ^^
Yeah, just thought that it'd be nice for the user to see progress (there's the custom notification support that can report progress to the CLI), or indicate that the compaction is still pending. I'm very suspicious if my laptop burns through its battery despite the command I'd blame for it returned in just a second. Just cosmetics at this point really :-) |
That is incorrect, above comment should be read more precise and look at code inside the
But I forgot to mention the last part where it also locks:
I think it is already about what you suggest here :-)
As you wish, I already run this on my own node. But I was a little worried about users running current |
…n restore, cleanup of some redundant init values in FileBackend, improve comments and error msg
…tBackend Minor fixes: - fix duplicate log entries, modify the existing root handler instead of adding one - remove obsolete flush, i.e. leaving `with` context already ensures close + flush - fix typos, improve comments Async/concurrent `compact` method using threading, all contained in FileBackend class. Some wrapping and minor API change (see README.md) was needed to keep it compatible with SocketBackend (which uses FileBackend on its remote side). A lock is used to make backup file access thread-safe. The main-thread handled db_write's, which modifies meta-data but otherwise only _appends_ to backup file. The compact-thread only reads, up-to what it knows from meta-data. Calculation of stats diff `before` vs `after` is removed, it doesn't make sense with moving parts. Early version check is done to maybe subscribe "shutdown" notification (v0.10.2+) to cleanup a running compaction when we shutdown, but older versions (v0.9.2+) still work.
A single Change can contain a snapshot and a transaction
df1c96a
to
d5ffe90
Compare
Rebased on latest master (nothing else).
|
Needs rebase, CI should be working now |
@cdecker @SimonVrouwe are we still interested in merging this? definitely seems useful... |
Closing for now as it seems dead. Feel free to reopen if work resumes! |
Continues #286
With this PR, calling
backup-compact
now returns (almost) immediate, without blocking c-lightning. Compaction runs in separate threat, while CL operates as normal.basic principle:
data_version
of current backupdata_version
into clone (here is the bulk of work and thread switching to handledb_write
hook)add_change
in main thread (this blocks CL), refresh latestdata_version
andmake the clone catch-up with it by adding missing
versions
Before this PR,
backup-compact
call would return a dictstats
when compaction completed.Now it returns immediately
{"result": "compaction started"}
,{"result": "compaction still in progress"}
or{"result": {"backupsize": <size_in_bytes>, "version_count": 2}}
when there is nothing to compact.The
stats
can be found in log, no idea who uses these.All pretty much self-contained in
FileBackend
, so it also works withSocketBackend
(which uses FileBackend on server side). The double logging issue #232 is also fixed, maybe this can be improved further, see task list.Also includes some minor fixes/cleanups and comments, mostly for my own understanding.
I manual tested compacting + restoring a backup and
sqldiff
with original and also tested with remote SocketBackend.With ld versions (v0.10.2+) that support "shutdown", an unfinished compaction (
.compacting
file) is cleaned up when shutting down. For users runningv0.10.2
before fix #4959, the plugin waits to be killed as last by the timeout in shutdown, seems prudent.Also retired the deprecated
--backup-destination
option.Todo:
make entirein retrospect, is overkillFileBackend
thread safe, i.e. alsorewind
make all logs in FileBackend being forwarded by-> out of scope, server should handle theseSocketBackend
systemd
script for weekly compactingOther ideas:
backup.py
and compare with other backup methodsthreading
withmultiprocessing