Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer storage feature #5361

Merged
merged 21 commits into from
Feb 8, 2023

Conversation

adi2011
Copy link
Collaborator

@adi2011 adi2011 commented Jun 30, 2022

PEER STORAGE BACKUP

This PR implements peer storage backup which will enable nodes to exchange their respective SCBs (Static channel backup), This will be useful in case of complete data loss.

WHY

One of the major features of lightning is that the transactions are off-chain and are stored in a local database, which makes this DB highly dynamic and complex to backup.
Peer storage backup will allow users to send their encrypted backup to their peers, which can be used to recover their funds in case of complete data loss.

HOW

The strategy is that we'll store the data received in the datastore.
So to implement this I've introduced 2 new messages i.e. PEER_STORAGE and YOUR_PEER_STORAGE, these will be used to exchange the data stored between the nodes.
-PEER_STORAGE will be used to send our own encrypted backup to the peer.
-YOUR_PEER_STORAGE will be used to send the most recent backup of the peer.

The general flow of messages every time we connect is:

Alice ----------------------- ♥️ ------------------------------ Bob

PEER_STORAGE 📤-------------📨-------------> 📥
YOUR_PEER_STORAGE 📤---------📨------------> 📥

📥 <------------📨--------------PEER_STORAGE 📤
📥 <-----------📨------------YOUR_PEER_STORAGE 📤


On receiving YOUR_PEER_STORAGE bob will verify if it's correct and Alice has not changed anything in his last sent backup.
On receiving PEER_STORAGE bob will update the backup he has stored for Alice in his own datastore.

Every time we open a new channel or close an old one we will send PEER_STORAGE to every peer we're connected to (Haven't figured out yet, maybe a new RPC (sendcustommsgmulti) which will enable plugins to send a single message to multiple users at once, because the interaction between lightningd and plugins is single-threaded)

They can choose to ignore the messages since they are odd (It's okay to be odd)

In case of complete data loss, the user will reconnect to their peers and hope that they get a YOUR_PEER_STORAGE message. Then they can directly use the RPC specified in the plugin to recover the channels.

@adi2011 adi2011 force-pushed the peer-storage-feature branch 2 times, most recently from 3627845 to 1589e98 Compare July 2, 2022 01:00
@adi2011 adi2011 marked this pull request as ready for review July 2, 2022 01:01
@adi2011 adi2011 requested a review from cdecker as a code owner July 2, 2022 01:01
@adi2011 adi2011 force-pushed the peer-storage-feature branch 2 times, most recently from b4ca14f to cc0eb71 Compare July 4, 2022 03:40
Copy link
Collaborator

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ack I just read the code, it requires more love in the review from my side!

tests/test_misc.py Outdated Show resolved Hide resolved
tests/test_misc.py Outdated Show resolved Hide resolved
lightningd/peer_control.c Outdated Show resolved Hide resolved
lightningd/peer_control.c Outdated Show resolved Hide resolved
plugins/chanbackup.c Outdated Show resolved Hide resolved
@adi2011
Copy link
Collaborator Author

adi2011 commented Jul 6, 2022

Thanks for the review @vincenzopalazzo, I have mostly fixed all the mentioned changes in the SCB PR, This one will get in after that :)

@vincenzopalazzo
Copy link
Collaborator

Ops! my bad I did not see that this was built on top of another PR! my bad!

@adi2011 adi2011 marked this pull request as draft July 8, 2022 13:05
@niftynei niftynei added this to the v22.10 milestone Jul 11, 2022
@adi2011 adi2011 force-pushed the peer-storage-feature branch 3 times, most recently from d790c0a to 9779edc Compare July 27, 2022 11:54
Copy link
Collaborator

@niftynei niftynei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Half done review, pushing up comments.

In general, smaller commits for each independent change would be great.

plugins/chanbackup.c Outdated Show resolved Hide resolved
}
},
{
"peerstoragebkp",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commands should always include a verb. Maybe populate-disk-scb-from-peer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore-from-peer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restorefrompeer looks good 👍, - is not allowed I think, they create problems in other languages.

{
"peerstoragebkp",
"recovery",
"Grabs the scb from datastore (if exists)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be more descriptive of what this does?

Checks if i have got a backup from a peer, and if so, will stub those channels in the database and if is successful, will return list of channels that have been successfully stubbed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command reads the latest backup storage data from our datastore and populates our database w/ stubs from it.

A few things about this are odd to me.

  1. How would someone know when to call this command?
  2. This seems like it could/should happen automatically when we get a scb from a peer?
  3. How do we know the scb we have in the datastore is valid/up to date? What's the worst case scenario if an out of date scb is applied?
  4. What happens for channels in the datastore scb that are already in our db?

Copy link
Collaborator Author

@adi2011 adi2011 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, these are some really nice questions:
Ans. 1) This would be used in case of data loss. The user could connect to their peers and collect their storage.

Ans. 2) Hmm, That's an excellent suggestion, everytime we receive scb we can automatically stub the channels in the DB (which are not already into it). What's your opinion on this @rustyrussell ?

Ans. 3) This feature would be used as a last resort to recover the funds in case of severe data loss, This can only help users to recover funds in case of severe data loss, This would never harm existing channels and funds.

Ans. 4) We skip over them and do not stub them in the DB.

after_latestbkp,
&forward_error,
NULL);
json_add_string(req->js, "key", "latestbkp");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: call it scb not bkp


}

static struct command_result *json_peerstoragebkp(struct command *cmd,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename bkp to scb

return send_outreq(cmd->plugin, req);
}

static struct command_result *after_datastore(struct command *cmd,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there's a generic option for htis? if not you can add one.

struct stat st;
struct node_id *node_id = tal(cmd, struct node_id);

int fd = open("emergency.recover", O_RDONLY);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really could be a separate function? don't you do this other places?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filenames in line is not great either fwiw.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filenames in line is not great either fwiw.

Defined a global var for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really could be a separate function? don't you do this other places?

Hmm, Done!

plugin_err(cmd->plugin, "closing: %s", strerror(errno));
}

peers = json_get_member(buf, params, "peers");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at json_scan?

plugins/chanbackup.c Outdated Show resolved Hide resolved
json_to_node_id(buf, nodeid, node_id);

req = jsonrpc_request_start(cmd->plugin,
cmd,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're gonna want to make this NULL.

Copy link
Collaborator

@niftynei niftynei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments!

{
"peerstoragebkp",
"recovery",
"Grabs the scb from datastore (if exists)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command reads the latest backup storage data from our datastore and populates our database w/ stubs from it.

A few things about this are odd to me.

  1. How would someone know when to call this command?
  2. This seems like it could/should happen automatically when we get a scb from a peer?
  3. How do we know the scb we have in the datastore is valid/up to date? What's the worst case scenario if an out of date scb is applied?
  4. What happens for channels in the datastore scb that are already in our db?

plugins/chanbackup.c Outdated Show resolved Hide resolved
json_to_scb_chan(buf, scbs, &scb_chan);
plugin_log(cmd->plugin, LOG_INFORM, "Updating the SCB");

update_scb(cmd->plugin, scb_chan);
return notification_handled(cmd);
plugin_log(cmd->plugin, LOG_INFORM, "Updating the SCB2");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's an "SCB2"? maybe make LOG_DBG

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, Was using it for debugging

tal_bytelen(serialise_scb));

send_outreq(cmd->plugin, req);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if not connected?

@@ -388,18 +389,113 @@ static struct command_result *after_send_scb(struct command *cmd,
return send_outreq(cmd->plugin, req);
}

struct info {
size_t idx;
const char *buf;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct node_id *peers;

{
plugin_log(cmd->plugin, LOG_INFORM, "Peer storage sent to");
info->idx += 1;
return after_listpeers(cmd, info->buf, json_parse_simple(cmd, info->buf, tal_bytelen(info->buf)), info);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would build a list of connected peers and iterate thru that, rather than holding onto the buffer w/ all the data in memory for the duration of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, we just need the index. Libplugin automatically handles multiple requests one by one

@cdecker
Copy link
Member

cdecker commented Nov 1, 2022

@adi2011: this is marked for release 22.11 for which we're trying to publish a first RC this week, but it is also marked as a draft. What's the current status of this? Happy to review and merge it if it is ready, otherwise we can also postpone it to the next release, which'll happen in ~2 months again.

@cdecker
Copy link
Member

cdecker commented Nov 2, 2022

After talking to @adi2011 we decided that we'll be pushing this PR to the next release.

@cdecker cdecker removed this from the v22.11 milestone Nov 2, 2022
@adi2011
Copy link
Collaborator Author

adi2011 commented Nov 3, 2022

Thanks! Will get onto it once my semester exams are over.

rustyrussell and others added 17 commits February 8, 2023 14:02
…peer_connected.

This is needed for the next patch, which does this from the peer_connected hook!

Signed-off-by: Rusty Russell <[email protected]>
Changelog-Changed: JSON-RPC: `sendcustommsg` can now be called by a plugin from within the `peer_connected` hook.
Add msg type peer_storage and your_peer_storage
We are now going to have messages which we know about, but yet we
don't handle ourselves.

[ I reversed this from Adi's, as that was clearer! --RR ]
And we should always represent them as is, not as optional: it's
possible in future we could *require* "WANT_PEER_BACKUP_STORAGE".

Signed-off-by: Rusty Russell <[email protected]>
node_id can be on the stack, avoiding a tal call.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell
Copy link
Contributor

Dumb typo completely broke peer_connected hook!

Ack 5805664

When you return an allocated pointer, you should always hand in the
context you want it allocated from.  This is more explicit, because it may
really matter to the caller!

This also folds some simple operations, and avoids doing too much
variable assignment in the declarations themselves: some coding styles
prohibit such initializers, but that's a bit exteme.

Signed-off-by: Rusty Russell <[email protected]>
Since it's not spec-final yet (hell, it's not even properly specified
yet!) we need to put it behind an experimental flag.

Unfortunately, we don't have support for doing this in a plugin; a
plugin must present features before parsing options.  So we need to do
it in core.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell
Copy link
Contributor

Ack 081a1d8

@endothermicdev
Copy link
Collaborator

Congrats @adi2011! It was a long road, but this is a very cool feature!

@endothermicdev endothermicdev merged commit a71bd3e into ElementsProject:master Feb 8, 2023
@adi2011
Copy link
Collaborator Author

adi2011 commented Feb 9, 2023

Thanks and congratulations to you too! :)

@instagibbs
Copy link
Collaborator

Test seems a bit flakey: https://pipelines.actions.githubusercontent.com/serviceHosts/e5392bd7-7462-4d15-b5db-89c31513dbe4/_apis/pipelines/1/runs/13853/signedlogcontent/81?urlExpires=2023-02-09T19%3A58%3A59.4670493Z&urlSigningMethod=HMACV1&urlSignature=foSYv9C4W2Xftr1Mop2sBqk9lnt%2B0hVTsxthcUGd62Y%3D

2023-02-09T19:56:59.1042925Z E           pyln.client.lightning.RpcError: RPC call failed: method: connect, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'host': 'localhost', 'port': 39159}, error: {'code': 402, 'message': 'disconnected during connection'}

@Sjors
Copy link
Contributor

Sjors commented Mar 6, 2023

Is there any documentation for this, other than the name of the option?

In particular it would be useful to illustrate how recovery works, e.g. "Start a new node with the same secret, find one of your old peers e.g. using a public archive like mempool . space, call lightning-cli restore-from-peer, etc".

man lightning-restore-from-peer is not working for me either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants