Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #503

Merged
merged 13 commits into from
Jan 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions docs/catchup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Ledger catchup

A node uses a process called `catchup` to sync its ledgers with other nodes. It does this process after
- starting up: Any node communicates state of its ledgers to any other node it connects to and learns whether is ahead or behind or at same state as others
- after a view change: After a view change starts, nodes again communicate state of their ledgers to other nodes.
- if it realises that it has missed some transactions: Nodes periodically send checkpoints indicating how many transactions they have processed
recently, if a node finds out that is missed some txns, it will perform catchup.

The catchup is managed by a object called `LedgerManager`. The `LedgerManager` maintains a `LedgerInfo` object for each ledger and references each ledger by its unique id called `ledger_id`.
When a `ledger` is initialised, `addLedger` method of `LedgerManager` is called; this method registers the ledger with the `LedgerManager`.
`addLedger` also accepts callbacks which are called on occurence of different events, like before/after starting catchup for a ledger,
before/after marking catchup complete for a ledger, after adding any transaction that was received in catchup to the ledger.
The `LedgerManager` performs catchup of each ledger sequentially, which means unless catchup for one ledger is complete, catchup for other will not start.
This is mostly done for simplicity and can be optimised but the pool ledger needs to be caught up first as the pool ledger knows how many nodes are there
in the network. Catchup for any ledger involves these steps:
- Learn the correct state (how many txns, merkle roots) of the ledger by using `LedgerStatus` messages from other nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention how and when LedgerStatus is sent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mentioned below.

- Once sufficient (`Quorums.ledger_status`) and consistent `LedgerStatus` messages are received so that the node knows the latest state of the ledger, if it finds it ledger to be
latest, it marks the ledger catchup complete otherwise wait for `ConsistencyProof` messages from other nodes until a timeout.
- When a node receives a `LedgerStatus` and it finds the sending node's ledger behind, it sends a `ConsistencyProof` message. This message is like a
merkle subset proof of the sending node's ledger and the receiving node's ledger, eg. if the sending node A's ledger has size 20 and merkle root `x` and
the receiving node B's size is 30 with merkle root `y`, B sends a proof to A that A's ledger with size 20 and root `x` is a subset of B's ledger with size
30 and root `y`.
- After receiving a `ConsistencyProof`, the node verifies it.
- Once the node that is catching up has sufficient (`Quorums.consistency_proof`) and consistent `ConsistencyProof` messages, it knows how many transactions it needs to catchup and
then requests those transactions from other nodes by equally distributing the load. eg if a node has to catchup 1000 txns and there are 5 other nodes in the
network, then the node will request 200 txns from each. The node catching up sends a `CatchupReq` message to other nodes and expects a corresponding `CatchupRep`

Apart from this if the node does not receive sufficient or consistent `ConsistencyProof`s under a timeout, it requests them using `request_CPs_if_needed`.
Similarly if the node does not receive sufficient or consistent `CatchupRep`s under a timeout, it requests them using `request_txns_if_needed`.
These timeouts can be configured using `ConsistencyProofsTimeout` and `CatchupTransactionsTimeout` respectively from the config file


Relevant code:
- LedgerManager: `plenum/common/ledger_manager.py`
- LedgerInfo: `plenum/common/ledger_info.py`
- Catchup message types: `plenum/common/messages/node_messages.py`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention Quorums (quorums.py) containing all quorum values including catch-up quorums.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Catchup quorum values: `plenum/server/quorums.py`
2 changes: 2 additions & 0 deletions docs/main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Outline of the system
Node
57 changes: 57 additions & 0 deletions docs/request_handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Request handling

To handle requests sent by client, the nodes require a ledger and/or a state as a data store. Request handling logic is written in `RequestHandler`.
`RequestHandler` is extended by new classes to support new requests. A node provides methods `register_new_ledger` to register new `Ledger` and `register_req_handler` to register new `RequestHandler`s.
There can be a callback registered to execute after a write request has been successfully executed using `register_executer`.


There are 2 types of requests a client can send:
- Query:
Here the client is requesting transactions or some state variables from the node(s). The client can either send a `GET_TXN` to get any transaction with a sequence number from any ledger.
Or it can send specific `GET_*` transactions for queries
- Write:
Here the client is asking the nodes to write a new transaction to the ledger and change some state variable. This requires the nodes to run a consensus protocol (currently RBFT).
If the protocol run is successful, then the client's proposed changes are written.


#### Request handling
Below is a description of how a request is processed.
A node on receiving a client request in `validateClientMsg`:
- The node performs static validation checks (validation that does not require any state, like mandatory fields are present, etc), it uses `ClientMessageValidator` and
`RequestHandler`'s `doStaticValidation` for this.
- If the static validation passes, it checks if the signature check is required (not required for queries) and does that if needed in `verifySignature`. More on this later.
- Checks if it's a generic transaction query (`GET_TXN`). If it is then query the ledger for that particular sequence number and return the result. A `REQNACK` might be sent if the query is not correctly constructed.
- Checks if it's a specific query (needs a specific `RequestHandler`), if it is then it uses `process_query` that uses the specific `RequestHandler` to respond to the query. A `REQNACK` might be sent if the query is not correctly constructed.
- If it is a write, then node checks if it has already processed the request before by checking the uniqueness of `identifier` and `reqId` fields of the Request.
- If it has already processed the request, then it sends the corresponding `Reply` to the client
- If the `Request` is already in process, then it sends an acknowledgement to the client as a `REQACK`
- If the node has not seen the `Request` before it broadcasts the `Request` to all nodes in a `PROPAGATE`.
- Once a node receives sufficient (`Quorums.propagate`) `PROPAGATE`s for a request, it forwards the request to its replicas.
- A primary replica on receiving a forwarded request does dynamic validation (requiring state, like if violating some unique constraint or doing an unauthorised action)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention batching here? That Primary applies dynamic validation for each request in the batch, and that the batch can still be processed if there are invalid requests there. Reject is sent just for invalid requests from the batch,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch is an optimisation for consensus, i didn't want to digress, i think a detailed doc explaining our differences in 3PC with RBFT should list batching, merkle roots and checks in 3PC messages, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's fine. Probably it's sufficient here to just mention this doc so that one can have a look how this all works with batches.

on the request using `doDynamicValidation` which in turn calls the corresponding `RequestHandler`'s `validate` method. If the validation succeeds then
`apply` method of `RequestHandler` is called which optimistically applies the changes to ledger and state.
If the validation fails, the primary sends a `REJECT` to the client. Then the primary sends a `PRE-PREPARE` to other nodes which contains the merkle roots and some other information for all such requests.
- The non-primary replicas on receiving the `PRE-PREPARE` performs the same dynamic validation that the primary performed on each request. It also checks if the merkle roots match.
If all checks pass the replicas send a `PREPARE` otherwise they reject the `PRE-PREPARE`
- Once the consensus protocol is successfully executed on the request, the replicas send `ORDERED` message to its node and the node updates the monitor.
- If the `ORDERED` message above was sent by the master replica then the node executes the request; meaning they commit any changes made to the ledger and state by that
request by calling the corresponding `RequestHandler`'s `commit` method and send a `Reply` back to client.
- The node also tracks the request's `identifier` and `reqId` in a key value database in `updateSeqNoMap`.


#### Signature verification
Each node has a `ReqAuthenticator` object which allows to register `ClientAuthNr` objects using `register_authenticator` method. During signature verification,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that we support both single signature and multi-signatures now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? Do you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to mention this if we have such a section as signature verification.

a node runs each registered authenticator over the request and if any authenticator results in an exception then signature verification is considered failed.
A node has atleast 1 authenticator called `CoreAuthNr` whose `authenticate` method is called over the serialised request data to verify signature.


Relevant code:
- Node: `plenum/server/node.py`
- Replica: `plenum/server/replica.py`
- Propagator: `plenum/server/propagator.py`
- Request: `plenum/common/request.py`
- Request structure validation: `plenum/common/messages/client_request.py`
- RequestHandler: `plenum/server/req_handler.py`
- Request Authenticator: `plenum/server/req_authenticator.py`
- Core Authenticator: `plenum/server/client_authn.py`
- Quorums: `plenum/server/quorums.py`
3 changes: 1 addition & 2 deletions docs/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@ Each new transaction is added to the ledger (log) and is also hashed (sha256) an
results in a new merkle root. Thus for each transaction a merkle proof of presence, called `inclusion proof` or `audit_path` can be created by
using the root hash a few (`O(lgn)`, `n` being the total number of leaves/transactions in the tree/ledger) intermediate hashes.

Hashes of all the
leaves and intermediate nodes of the tree are stored in a `HashStore`, enabling the creating of inclusion proofs and subset proofs. A subset proof
Hashes of all the leaves and intermediate nodes of the tree are stored in a `HashStore`, enabling the creating of inclusion proofs and subset proofs. A subset proof
proves that a particular merkle tree is a subset of another (usually larger) merkle tree; more on this in the Catchup doc. The `HashStore` has 2 separate storages for storing leaves
and intermediate nodes, each leaf or node of the tree is 32 bytes. Each of these storages can be a binary file or a key value store.
The leaf or node hashes are queried by their number. When a client write request completes or it requests a transaction with a particular sequence number from a ledger,
Expand Down
9 changes: 5 additions & 4 deletions plenum/server/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -1957,17 +1957,18 @@ def processRequest(self, request: Request, frm: str):
ledgerId = self.ledger_id_for_request(request)
ledger = self.getLedger(ledgerId)

if self.is_query(request.operation[TXN_TYPE]):
self.process_query(request, frm)
return

reply = self.getReplyFromLedger(ledger, request)
if reply:
logger.debug("{} returning REPLY from already processed "
"REQUEST: {}".format(self, request))
self.transmitToClient(reply, frm)
return

if self.is_query(request.operation[TXN_TYPE]):
self.process_query(request, frm)
return

# If the node is not already processing the request
if not self.isProcessingReq(*request.key):
self.startedProcessingReq(*request.key, frm)
# If not already got the propagate request(PROPAGATE) for the
Expand Down