Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bubble up errors from the Bitcoin interface instead of panic'ing there #909

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

darosior
Copy link
Member

We assume a bitcoind is always running beneath Liana. As such, we make a reasonable attempt to stay connected on spurious errors but we panic if we can't reach it for more than a minute. We also panic on unexpected API breaks (if an expected field isn't present or of the right type).
The rationale for this is pretty simple:

  1. We simply can't continue without a running bitcoind. On a bitcoind failure there is nothing else to than stopping the program.
  2. Failing directly in the bitcoind module (or any other implementation of the Bitcion interface) avoids cluttering the whole API with Result's.

That said, we manifested some interest recently in actually bearing the cost of cumbersome Result's in the API in order to:

  1. Have nicer failures. Sure, we can't continue without bitcoind. But it doesn't mean we should straight away crash from under the user who would probably be using the GUI and have no clue what's going on.
  2. On some non-fatal failures (which ones? cc @edouardparis), try to restart Liana. (Maybe for Support bitcoind being restarted from under us #566? If so there is nicer way to achieve this..)

I open this as a draft as it's based on #906 and i still have open questions:

  • Is it worth it? Not going to pretend i don't have a huge sunk cost here since i've implemented it all already, but still worth asking ourselves this question.
  • What do you think of the API?
    • The poller thread now needs to be externally managed, which i think makes the most sense.
    • I couldn't come up with anything better than Result<T, Box<dyn std::error::Error>> for the Bitcoin interface trait. I haven't looked too hard into it but if you've got a better suggestion i'd love to hear it.

@edouardparis
Copy link
Member

Failing directly in the bitcoind module (or any other implementation of the Bitcion interface) avoids cluttering the whole API with Result's.

I personally believe Result is not cluttering the API. Rust has great ways to write error handling code in contrary of GO where you have if err != nil everywhere, and even with GO I was told to never use panic except in the internal of encapsulated code that have very clear invariants. Bitcoin node is totally external to the code and should not be considered a such.

Have nicer failures. Sure, we can't continue without bitcoind. But it doesn't mean we should straight away crash from under the user who would probably be using the GUI and have no clue what's going on.
On some non-fatal failures (which ones? cc @edouardparis), try to restart Liana. (Maybe for
#566? If so there is nicer way to achieve this..)

We agree about the user UX/UI, better to show him the issue. I see three kinds of errors:

  • specific errors returned by commands (client request is not good, context changed, coin was already spent...): the error is already displayed to the user in a orange banner.
  • Failure error in the commands (means the poller is still running): we can display an error to the user to debug in an orange banner.
  • Failure error in the poller (the poller stopped): we can redirect user to a full page like the one when the bitcoin node is not running with the same layout and a restart button.

Is it worth it? Not going to pretend i don't have a huge sunk cost here since i've implemented it all already, but still worth asking ourselves this question.

Yes, especially in the long term. If we have multiple backend support we may have errors that we want to propagate instead of panicking.

What do you think of the API? poller thread now needs to be externally managed, which i think makes the most sense.

I like that we have a final error handling in the main.rs.

I couldn't come up with anything better than Result<T, Box> for the Bitcoin interface trait. I haven't looked too hard into it but if you've got a better suggestion i'd love to hear it.

I believe it is fine for now, we can always rework on top if we need fined grained error matching.

@darosior darosior force-pushed the 2401_remove_hard_dep_bitcoind branch from a557a2b to fbeb693 Compare January 11, 2024 10:03
@darosior darosior marked this pull request as ready for review January 11, 2024 10:04
@darosior
Copy link
Member Author

specific errors returned by commands (client request is not good, context changed, coin was already spent...): the error is already displayed to the user in a orange banner.

If we are making a bad request i don't see how restarting could make a difference. We should stop as soon as possible. (Except maybe in rare mishandled edge cases where the delay might solve the issue.)

Failure error in the poller (the poller stopped): we can redirect user to a full page like the one when the bitcoin node is not running with the same layout and a restart button.

I'm not sure. It seems to be very specific while the error could be due to many other things than the connection with bitcoind?

Maybe if the poller crash, a modal with the error should be displayed. We could let the user change their bitcoind settings there and try to restart but we shouldn't make it feel as if it's necessarily going to fix the issue.

@darosior darosior force-pushed the 2401_remove_hard_dep_bitcoind branch from fbeb693 to 72920b2 Compare January 11, 2024 10:26
@edouardparis
Copy link
Member

edouardparis commented Jan 17, 2024

If we are making a bad request i don't see how restarting could make a difference. We should stop as soon as possible. (Except maybe in rare mishandled edge cases where the delay might solve the issue.)

Then not a banner, but a full page error (not in ugly orange) with a warning icon and all the details.

I'm not sure. It seems to be very specific while the error could be due to many other things than the connection with bitcoind?

Maybe if the poller crash, a modal with the error should be displayed. We could let the user change their bitcoind settings there and try to restart but we shouldn't make it feel as if it's necessarily going to fix the issue.

In both case, it is good to have an error message to the user and maybe a button "retrieve the logs", that open the file system management window and let user register the logs of the day in a file he chose the location.
(Instead of pointing him toward the not trivial "datadir/network/liana.log")

And I agree that we shouldn't make it feel as if it's necessarily going to fix the issue.

@darosior
Copy link
Member Author

Yes, that's all good points.

@@ -208,13 +208,11 @@ fn new_tip(bit: &impl BitcoinInterface, current_tip: &BlockChainTip) -> TipUpdat
}

fn updates(
db_conn: &mut Box<dyn DatabaseConnection>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit a8ca850 "poller: hold the same database lock across one update round" is confused. This is not a lock, it's a connection.

darosior added a commit to darosior/liana that referenced this pull request Mar 15, 2024
This is inspired from the work in
wizardsardine#909 (specifically
wizardsardine@d8c59e3)
to externalize the management of the poller thread. However, there may
be only one poller thread. Starting more than one can lead to a crash or
potentially to data corruption. Therefore it feels safer to manage it
internally.

Instead of exposing the management of the poller to the user of the
library, we manage both threads inside the `DaemonHandle` data structure
and expose a way for a user to check for errors which may have occured
in any of the threads.

This makes it possible to:
1. Eventually propagate errors from the threads to the user of the
   daemon (wizardsardine#909);
2. Communicate internally with the poller thread, for instance to
   trigger a poll immediately (following commits).
darosior added a commit to darosior/liana that referenced this pull request Mar 20, 2024
This is inspired from the work in
wizardsardine#909 (specifically
wizardsardine@d8c59e3)
to externalize the management of the poller thread. However, there may
be only one poller thread. Starting more than one can lead to a crash or
potentially to data corruption. Therefore it feels safer to manage it
internally.

Instead of exposing the management of the poller to the user of the
library, we manage both threads inside the `DaemonHandle` data structure
and expose a way for a user to check for errors which may have occured
in any of the threads.

This makes it possible to:
1. Eventually propagate errors from the threads to the user of the
   daemon (wizardsardine#909);
2. Communicate internally with the poller thread, for instance to
   trigger a poll immediately (following commits).
darosior added a commit that referenced this pull request Mar 22, 2024
58c71c7 lib: gate the RPC server availability on the 'daemon' feature (Antoine Poinsot)
b7fde6a commands: update our state immediately after broadcasting a tx (Antoine Poinsot)
1cf42d9 poller: introduce a communication channel with the poller thread (Antoine Poinsot)
f6ce85c lib: remove the panic hook. (Antoine Poinsot)
b4fe963 lib: encapsulate the handling of both threads (poller and RPC server) (Antoine Poinsot)
fd5387f poller: use the same database connection across one update round (Antoine Poinsot)
ea6923e poller: make the updating process into its own function. (Antoine Poinsot)

Pull request description:

  Fixes #887.

  This takes a couple commits from #909 but takes the approach from there in another direction: we don't externalize the poller, since only a single instance must be ran. Instead we properly keep track of the (up to) two threads we manage in the `DaemonHandle` and provide a way for a user of the library to check for errors in any of the threads.

  This approach allows us to 1) communicate with the poller thread from inside the Liana library/daemon (here we leverage this to tell it to poll) 2) eventually (#909) expose all internal errors from the library to the user instead of panic'ing internally.

  See the commit messages for details.

ACKs for top commit:
  darosior:
    ACK 58c71c7 -- did another pass and Edouard tested this in the GUI.

Tree-SHA512: 0ab436b2a187f9d124ed8861a47f03bb1e9252cdc4f3b5c4308db07be738c78b2ea3f07dc0a9586e3d5bd34f071a1e2a2569cad30676c9cc004e39260ebb94ca
@nondiremanuel nondiremanuel added the Nice to have If it's not completed in time for the current version, it can be postponed label Jun 19, 2024
@nondiremanuel nondiremanuel added this to the Liana v7 milestone Jun 25, 2024
@nondiremanuel nondiremanuel modified the milestones: v7 - Liana, v8 - Liana Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nice to have If it's not completed in time for the current version, it can be postponed
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants