Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins: db_write hook and safe shutdown, back to square one? #5859

Open
SimonVrouwe opened this issue Dec 28, 2022 · 2 comments
Open

plugins: db_write hook and safe shutdown, back to square one? #5859

SimonVrouwe opened this issue Dec 28, 2022 · 2 comments

Comments

@SimonVrouwe
Copy link
Collaborator

SimonVrouwe commented Dec 28, 2022

By change chance I discovered that lightningd's shutdown sequence has been changed again in #5577, which seems to partly revert the effects of #4897 and #4959.

#4897 and #4959 where carefully crafted to guarantee that no database write can occur after plugins are shutdown, so that a backup plugin could never miss db_write hook calls.

Most notably e7fe59b moves closing of the database back to the bottom again, breaking that guarantee.

/* Tell plugins we're shutting down, use force if necessary. */
shutdown_plugins(ld);
/* Now kill any remaining connections */
jsonrpc_stop_all(ld);
/* Get rid of major subdaemons. */
shutdown_global_subdaemons(ld);
/* Clean up internal peer/channel/htlc structures. */
free_all_channels(ld);
/* Now close database */
ld->wallet->db = tal_free(ld->wallet->db);

In my opinion this makes use of backup.py plugin unsafe (again).

@rustyrussell Why the change?

#5577 mentions an issue, but has no further reference, about:

bookkeeper gets upset that its commands are rejected during shutdown

Seriously?

edit Above issue with the db_write hook also applies in general to other hooks. When plugins are freed during shutdown their hooks are unregistered and one expects that these hooks cannot be called after that (i.e. that lightningd is sufficiently dead), this was the case before e7fe59b. Now some subdaemons are kept alive longer, for example connectd happily accepts incoming connections during the 30s shutdown window. But any plugin that registered the peer_connected hook and maybe would reject the peer, is gone!

This was already discussed in #4883 and so the answer to the title seems (sadly) 'Yes'.
How unsafe is it? I don't know, missing a peer_connected hook doesn't seem very harmful, just ugly and inconsistent. What about other hooks? And why is hsmd still running, will it sign anything?

So far no response from the author of e7fe59b, perhaps @niftynei can help me out what the issues where with bookkeeper before that commit?

@SimonVrouwe
Copy link
Collaborator Author

As you can probably tell from the title, there was a some frustration:

Plugin documentation wasn't updated (No more error code -5).

The meager Changelog entry:

Plugins: RPC operations are now still available during shutdown. (#5577)

is confusing and seems to contradict /* Stop *new* JSON RPC requests. */ (below)

In reality a plugin cannot make new RPC calls during shutdown (like datastore, which is good):

/* Stop *new* JSON RPC requests. */
jsonrpc_stop_listening(ld->jsonrpc);

but when the io_loop is resurrected inside shutdown_plugins:

void *ret = io_loop(timer, &expired);

plugins can still receive the response of existing RPC calls.
Also three subdaemons (connectd, gossip and hsm) are still running, why?
Can we be sure activity does not trigger a db_write?

Anyway, please have a look at #5872.

@SimonVrouwe
Copy link
Collaborator Author

Below test (see this branch) shows that connectd accepts incoming connections during shutdown, circumventing any previously registered peer_connected hook:

def test_connected_hook_shutdown(node_factory, executor):
    """l1 connects to l2 which is shutting down while handling the connected hook
    """
    opts = [{'may_reconnect': True},
            {'may_reconnect': True,
             'important-plugin': os.path.join(os.getcwd(), 'tests/plugins/reject.py'),
             'plugin': os.path.join(os.getcwd(), 'tests/plugins/delay_shutdown.py')
             }]

    l1, l2 = node_factory.get_nodes(2, opts=opts)
    l2.rpc.reject(l1.info['id'])

    # l2's connected hook reject l1's connect attempt correctly
    l1.connect(l2)
    l1.daemon.wait_for_log(r"You are in reject list")
    assert not len(l1.rpc.listpeers()['peers'])

    # l2 should still reject it while shutting down
    f_stop = executor.submit(l2.rpc.stop)
    l2.daemon.wait_for_log(r"delaying shutdown with 10s")
    l1.connect(l2)
    assert not len(l1.rpc.listpeers()['peers'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants