-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation #503
Documentation #503
Changes from all commits
4ab19d0
d24f08e
d08dfd6
a5fedbf
e5d982f
3876012
a3c45e6
89e194b
e27ba42
dfe2413
db0ffba
328d677
9cbf440
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Ledger catchup | ||
|
||
A node uses a process called `catchup` to sync its ledgers with other nodes. It does this process after | ||
- starting up: Any node communicates state of its ledgers to any other node it connects to and learns whether is ahead or behind or at same state as others | ||
- after a view change: After a view change starts, nodes again communicate state of their ledgers to other nodes. | ||
- if it realises that it has missed some transactions: Nodes periodically send checkpoints indicating how many transactions they have processed | ||
recently, if a node finds out that is missed some txns, it will perform catchup. | ||
|
||
The catchup is managed by a object called `LedgerManager`. The `LedgerManager` maintains a `LedgerInfo` object for each ledger and references each ledger by its unique id called `ledger_id`. | ||
When a `ledger` is initialised, `addLedger` method of `LedgerManager` is called; this method registers the ledger with the `LedgerManager`. | ||
`addLedger` also accepts callbacks which are called on occurence of different events, like before/after starting catchup for a ledger, | ||
before/after marking catchup complete for a ledger, after adding any transaction that was received in catchup to the ledger. | ||
The `LedgerManager` performs catchup of each ledger sequentially, which means unless catchup for one ledger is complete, catchup for other will not start. | ||
This is mostly done for simplicity and can be optimised but the pool ledger needs to be caught up first as the pool ledger knows how many nodes are there | ||
in the network. Catchup for any ledger involves these steps: | ||
- Learn the correct state (how many txns, merkle roots) of the ledger by using `LedgerStatus` messages from other nodes. | ||
- Once sufficient (`Quorums.ledger_status`) and consistent `LedgerStatus` messages are received so that the node knows the latest state of the ledger, if it finds it ledger to be | ||
latest, it marks the ledger catchup complete otherwise wait for `ConsistencyProof` messages from other nodes until a timeout. | ||
- When a node receives a `LedgerStatus` and it finds the sending node's ledger behind, it sends a `ConsistencyProof` message. This message is like a | ||
merkle subset proof of the sending node's ledger and the receiving node's ledger, eg. if the sending node A's ledger has size 20 and merkle root `x` and | ||
the receiving node B's size is 30 with merkle root `y`, B sends a proof to A that A's ledger with size 20 and root `x` is a subset of B's ledger with size | ||
30 and root `y`. | ||
- After receiving a `ConsistencyProof`, the node verifies it. | ||
- Once the node that is catching up has sufficient (`Quorums.consistency_proof`) and consistent `ConsistencyProof` messages, it knows how many transactions it needs to catchup and | ||
then requests those transactions from other nodes by equally distributing the load. eg if a node has to catchup 1000 txns and there are 5 other nodes in the | ||
network, then the node will request 200 txns from each. The node catching up sends a `CatchupReq` message to other nodes and expects a corresponding `CatchupRep` | ||
|
||
Apart from this if the node does not receive sufficient or consistent `ConsistencyProof`s under a timeout, it requests them using `request_CPs_if_needed`. | ||
Similarly if the node does not receive sufficient or consistent `CatchupRep`s under a timeout, it requests them using `request_txns_if_needed`. | ||
These timeouts can be configured using `ConsistencyProofsTimeout` and `CatchupTransactionsTimeout` respectively from the config file | ||
|
||
|
||
Relevant code: | ||
- LedgerManager: `plenum/common/ledger_manager.py` | ||
- LedgerInfo: `plenum/common/ledger_info.py` | ||
- Catchup message types: `plenum/common/messages/node_messages.py` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also mention Quorums (quorums.py) containing all quorum values including catch-up quorums. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
- Catchup quorum values: `plenum/server/quorums.py` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Outline of the system | ||
Node |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Request handling | ||
|
||
To handle requests sent by client, the nodes require a ledger and/or a state as a data store. Request handling logic is written in `RequestHandler`. | ||
`RequestHandler` is extended by new classes to support new requests. A node provides methods `register_new_ledger` to register new `Ledger` and `register_req_handler` to register new `RequestHandler`s. | ||
There can be a callback registered to execute after a write request has been successfully executed using `register_executer`. | ||
|
||
|
||
There are 2 types of requests a client can send: | ||
- Query: | ||
Here the client is requesting transactions or some state variables from the node(s). The client can either send a `GET_TXN` to get any transaction with a sequence number from any ledger. | ||
Or it can send specific `GET_*` transactions for queries | ||
- Write: | ||
Here the client is asking the nodes to write a new transaction to the ledger and change some state variable. This requires the nodes to run a consensus protocol (currently RBFT). | ||
If the protocol run is successful, then the client's proposed changes are written. | ||
|
||
|
||
#### Request handling | ||
Below is a description of how a request is processed. | ||
A node on receiving a client request in `validateClientMsg`: | ||
- The node performs static validation checks (validation that does not require any state, like mandatory fields are present, etc), it uses `ClientMessageValidator` and | ||
`RequestHandler`'s `doStaticValidation` for this. | ||
- If the static validation passes, it checks if the signature check is required (not required for queries) and does that if needed in `verifySignature`. More on this later. | ||
- Checks if it's a generic transaction query (`GET_TXN`). If it is then query the ledger for that particular sequence number and return the result. A `REQNACK` might be sent if the query is not correctly constructed. | ||
- Checks if it's a specific query (needs a specific `RequestHandler`), if it is then it uses `process_query` that uses the specific `RequestHandler` to respond to the query. A `REQNACK` might be sent if the query is not correctly constructed. | ||
- If it is a write, then node checks if it has already processed the request before by checking the uniqueness of `identifier` and `reqId` fields of the Request. | ||
- If it has already processed the request, then it sends the corresponding `Reply` to the client | ||
- If the `Request` is already in process, then it sends an acknowledgement to the client as a `REQACK` | ||
- If the node has not seen the `Request` before it broadcasts the `Request` to all nodes in a `PROPAGATE`. | ||
- Once a node receives sufficient (`Quorums.propagate`) `PROPAGATE`s for a request, it forwards the request to its replicas. | ||
- A primary replica on receiving a forwarded request does dynamic validation (requiring state, like if violating some unique constraint or doing an unauthorised action) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we mention batching here? That Primary applies dynamic validation for each request in the batch, and that the batch can still be processed if there are invalid requests there. Reject is sent just for invalid requests from the batch, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Batch is an optimisation for consensus, i didn't want to digress, i think a detailed doc explaining our differences in 3PC with RBFT should list batching, merkle roots and checks in 3PC messages, etc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, that's fine. Probably it's sufficient here to just mention this doc so that one can have a look how this all works with batches. |
||
on the request using `doDynamicValidation` which in turn calls the corresponding `RequestHandler`'s `validate` method. If the validation succeeds then | ||
`apply` method of `RequestHandler` is called which optimistically applies the changes to ledger and state. | ||
If the validation fails, the primary sends a `REJECT` to the client. Then the primary sends a `PRE-PREPARE` to other nodes which contains the merkle roots and some other information for all such requests. | ||
- The non-primary replicas on receiving the `PRE-PREPARE` performs the same dynamic validation that the primary performed on each request. It also checks if the merkle roots match. | ||
If all checks pass the replicas send a `PREPARE` otherwise they reject the `PRE-PREPARE` | ||
- Once the consensus protocol is successfully executed on the request, the replicas send `ORDERED` message to its node and the node updates the monitor. | ||
- If the `ORDERED` message above was sent by the master replica then the node executes the request; meaning they commit any changes made to the ledger and state by that | ||
request by calling the corresponding `RequestHandler`'s `commit` method and send a `Reply` back to client. | ||
- The node also tracks the request's `identifier` and `reqId` in a key value database in `updateSeqNoMap`. | ||
|
||
|
||
#### Signature verification | ||
Each node has a `ReqAuthenticator` object which allows to register `ClientAuthNr` objects using `register_authenticator` method. During signature verification, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we mention that we support both single signature and multi-signatures now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe? Do you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to mention this if we have such a section as signature verification. |
||
a node runs each registered authenticator over the request and if any authenticator results in an exception then signature verification is considered failed. | ||
A node has atleast 1 authenticator called `CoreAuthNr` whose `authenticate` method is called over the serialised request data to verify signature. | ||
|
||
|
||
Relevant code: | ||
- Node: `plenum/server/node.py` | ||
- Replica: `plenum/server/replica.py` | ||
- Propagator: `plenum/server/propagator.py` | ||
- Request: `plenum/common/request.py` | ||
- Request structure validation: `plenum/common/messages/client_request.py` | ||
- RequestHandler: `plenum/server/req_handler.py` | ||
- Request Authenticator: `plenum/server/req_authenticator.py` | ||
- Core Authenticator: `plenum/server/client_authn.py` | ||
- Quorums: `plenum/server/quorums.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention how and when LedgerStatus is sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mentioned below.