Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to TLA+ specs for model-based testing #546

Merged
merged 16 commits into from
Sep 24, 2020
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/spec/lightclient/verification/Blockchain_A_1.tla
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ LBT == [header |-> BT, Commits |-> {NT}]

(* the header is still within the trusting period *)
InTrustingPeriod(header) ==
now <= header.time + TRUSTING_PERIOD
now < header.time + TRUSTING_PERIOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the <= actually appears to contradict the english spec which says

 - *LightStore* always contains a verified header whose age is less than the
 trusting period,  

So it seems clear that what we ant is "less than", but perhaps this inconsistency was due in part to the wording and terminology here. E.g., in this paragraph

In order to ensure liveness, LightStore always needs to contain a
verified (or initially trusted) header whose time is within the
trusting period.

"trust period" is spoken of as a duration that the header needs to be within not as a limit point that the header time needs to be less than. I wonder if calling this TRUST_PERIOD_LIMIT might help prevent such confusion i the future. I suspect a problem with the current wording may arise due to "trusting period" sounds like a span within which things are valid, when it's actually meant to be the outer limit beyond which things are invalid. iiuc, the trust period is that span of time up to but not including the limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right: <= does contradict the English spec, so this is the TLA+ modeling artifact. That there can be ambiguities in the English spec, that's of course possible -- English is not a formal language after all;) I still think that our English spec does an amazing job of clarifying the requirements at the semi-formal (and as close to formal as possible) level. But exactly because of this non-strictness of the English language the TLA+ spec should, in my opinion, be the ultimate source of truth for all parties.

One of the things we were discussing with @josef-widder and @konnov is that now, when we have the three artifacts: the English spec, the TLA+ spec, and the implementation, and also now that we have the MBT, which serves as the "missing link" helping to synchronize them, we need to establish a process (in the VDD sense) that will help us to resolve the conflicts.

In this PR I've fixed the TLA+ spec, because it was the most easy way for me to go, but it's not clear at all whether that's the right thing to do. We need to figure out a joint understanding between the research and the development teams on how to proceed in such cases.

Copy link
Member

@ebuchman ebuchman Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, given the larger context, I don't believe it matters much whether we use < or <= here since we are dealing with time and already assume clocks can drift some amount. The important thing is that TRUST_PERIOD is a fraction of the UNBONDING_PERIOD. This ensures that light clients have the duration of UNBONDING_PERIOD - TRUST_PERIOD to try and submit evidence to the chain and get faulty validators punished. And since all these periods are on the order of days/weeks, the < vs. <= here shouldn't make a difference in the end.

Also while UNBONDING_PERIOD is a global consensus parameter (all nodes agree on), the TRUST_PERIOD is not, it's a local subjective parameter that clients can set depending on their risk tolerance. So again, the plank second difference between < and <= shouldn't matter there.

Of course that might be an unsatisfactory answer - I don't mean to be flippant about it and we should certainly be consistent and precise, but it's important to keep the practical context in mind so we know what details are most important to focus on. Of course this is just input from my feeble human mind so possibly machine will prove that it does matter, but thought it was worth providing the extra context here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I totally agree, from the practical perspective these differences between < and <= do not matter at all in this particular context, so this case we could fix however we like, and it would make no real difference. There are two things to keep in mind though:

  • scientifically, we can't be approximately precise. And I would like to think about our specs as being precise, so we should clarify and fix any ambiguities.
  • this particular example is only a manifestation of a general problem about the artifacts we are having, and I am sure we will encounter more instances of it, with probably higher level of severity. So it's better if we are prepared for that.

Also I tend to perceive us having this problem as rather a good thing, demonstrating that from now on we will have a living specification, which is in sync with the implementation.

This is only the input from another feeble human mind, just a bit different;) And I am sure fixing this will not take much of either our focus or time.


(*
Given a function pVotingPower \in D -> Powers for some D \subseteq AllNodes
Expand Down Expand Up @@ -110,7 +110,7 @@ FaultAssumption(pFaultyNodes, pNow, pBlockchain) ==
(* Can a block be produced by a correct peer, or an authenticated Byzantine peer *)
IsLightBlockAllowedByDigitalSignatures(ht, block) ==
\/ block.header = blockchain[ht] \* signed by correct and faulty (maybe)
\/ block.Commits \subseteq Faulty /\ block.header.height = ht \* signed only by faulty
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time >= 0 \* signed only by faulty

(*
Initialize the blockchain to the ultimate height right in the initial states.
Expand Down
51 changes: 38 additions & 13 deletions docs/spec/lightclient/verification/Lightclient_A_1.tla
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,28 @@ VARIABLES
lightBlockStatus, (* a function from heights to block statuses *)
latestVerified (* the latest verified block *)

(* the variables of the lite client *)
lcvars == <<state, nextHeight, fetchedLightBlocks, lightBlockStatus, latestVerified>>
(* the variables of the lite client *)
lcvars == <<state, nextHeight, fetchedLightBlocks, lightBlockStatus, latestVerified>>

(* the light client previous state components, used for monitoring *)
VARIABLES
prevVerified,
prevCurrent,
prevNow,
prevVerdict

InitMonitor(verified, current, now, verdict) ==
/\ prevVerified = verified
/\ prevCurrent = current
/\ prevNow = now
/\ prevVerdict = verdict

NextMonitor(verified, current, now, verdict) ==
/\ prevVerified' = verified
/\ prevCurrent' = current
/\ prevNow' = now
/\ prevVerdict' = verdict


(******************* Blockchain instance ***********************************)

Expand Down Expand Up @@ -74,7 +94,9 @@ ValidAndVerifiedPre(trusted, untrusted) ==
/\ BC!InTrustingPeriod(thdr)
/\ thdr.height < uhdr.height
\* the trusted block has been created earlier (no drift here)
/\ thdr.time <= uhdr.time
/\ thdr.time < uhdr.time
\* the untrusted block is not from the future
/\ uhdr.time < now
/\ untrusted.Commits \subseteq uhdr.VS
/\ LET TP == Cardinality(uhdr.VS)
SP == Cardinality(untrusted.Commits)
Expand Down Expand Up @@ -106,7 +128,7 @@ SignedByOneThirdOfTrusted(trusted, untrusted) ==
*)
ValidAndVerified(trusted, untrusted) ==
IF ~ValidAndVerifiedPre(trusted, untrusted)
THEN "FAILED_VERIFICATION"
THEN "INVALID"
ELSE IF ~BC!InTrustingPeriod(untrusted.header)
(* We leave the following test for the documentation purposes.
The implementation should do this test, as signature verification may be slow.
Expand All @@ -115,8 +137,8 @@ ValidAndVerified(trusted, untrusted) ==
THEN "FAILED_TRUSTING_PERIOD"
ELSE IF untrusted.header.height = trusted.header.height + 1
\/ SignedByOneThirdOfTrusted(trusted, untrusted)
THEN "OK"
ELSE "CANNOT_VERIFY"
THEN "SUCCESS"
ELSE "NOT_ENOUGH_TRUST"

(*
Initial states of the light client.
Expand All @@ -135,6 +157,7 @@ LCInit ==
/\ lightBlockStatus = [h \in {TRUSTED_HEIGHT} |-> "StateVerified"]
\* the latest verified block the the trusted block
/\ latestVerified = trustedLightBlock
/\ InitHistory(trustedLightBlock, trustedLightBlock, now, "SUCCESS")

\* block should contain a copy of the block from the reference chain, with a matching commit
CopyLightBlockFromChain(block, height) ==
Expand Down Expand Up @@ -207,8 +230,9 @@ VerifyToTargetLoop ==
/\ nprobes' = nprobes + 1
\* Verify the current block
/\ LET verdict == ValidAndVerified(latestVerified, current) IN
NextHistory(latestVerified, current, now, verdict) /\
\* Decide whether/how to continue
CASE verdict = "OK" ->
CASE verdict = "SUCCESS" ->
/\ lightBlockStatus' = LightStoreUpdateStates(lightBlockStatus, nextHeight, "StateVerified")
/\ latestVerified' = current
/\ state' =
Expand All @@ -219,7 +243,7 @@ VerifyToTargetLoop ==
/\ CanScheduleTo(newHeight, current, nextHeight, TARGET_HEIGHT)
/\ nextHeight' = newHeight

[] verdict = "CANNOT_VERIFY" ->
[] verdict = "NOT_ENOUGH_TRUST" ->
(*
do nothing: the light block current passed validation, but the validator
set is too different to verify it. We keep the state of
Expand All @@ -244,7 +268,8 @@ VerifyToTargetLoop ==
VerifyToTargetDone ==
/\ latestVerified.header.height >= TARGET_HEIGHT
/\ state' = "finishedSuccess"
/\ UNCHANGED <<nextHeight, nprobes, fetchedLightBlocks, lightBlockStatus, latestVerified>>
/\ UNCHANGED <<nextHeight, nprobes, fetchedLightBlocks, lightBlockStatus, latestVerified>>
/\ UNCHANGED <<prevVerified, prevCurrent, prevNow, prevVerdict>>

(********************* Lite client + Blockchain *******************)
Init ==
Expand Down Expand Up @@ -310,7 +335,7 @@ CorrectnessInv ==
fetchedLightBlocks[h].header = blockchain[h]

(**
Check that the sequence of the headers in storedLightBlocks satisfies ValidAndVerified = "OK" pairwise
Check that the sequence of the headers in storedLightBlocks satisfies ValidAndVerified = "SUCCESS" pairwise
This property is easily violated, whenever a header cannot be trusted anymore.
*)
StoredHeadersAreVerifiedInv ==
Expand All @@ -322,7 +347,7 @@ StoredHeadersAreVerifiedInv ==
\/ \E mh \in DOMAIN fetchedLightBlocks:
lh < mh /\ mh < rh
\* or we can verify the right one using the left one
\/ "OK" = ValidAndVerified(fetchedLightBlocks[lh], fetchedLightBlocks[rh])
\/ "SUCCESS" = ValidAndVerified(fetchedLightBlocks[lh], fetchedLightBlocks[rh])

\* An improved version of StoredHeadersAreSound, assuming that a header may be not trusted.
\* This invariant candidate is also violated,
Expand All @@ -336,7 +361,7 @@ StoredHeadersAreVerifiedOrNotTrustedInv ==
\/ \E mh \in DOMAIN fetchedLightBlocks:
lh < mh /\ mh < rh
\* or we can verify the right one using the left one
\/ "OK" = ValidAndVerified(fetchedLightBlocks[lh], fetchedLightBlocks[rh])
\/ "SUCCESS" = ValidAndVerified(fetchedLightBlocks[lh], fetchedLightBlocks[rh])
\* or the left header is outside the trusting period, so no guarantees
\/ ~BC!InTrustingPeriod(fetchedLightBlocks[lh].header)

Expand All @@ -359,7 +384,7 @@ ProofOfChainOfTrustInv ==
\* or the left header is outside the trusting period, so no guarantees
\/ ~(BC!InTrustingPeriod(fetchedLightBlocks[lh].header))
\* or we can verify the right one using the left one
\/ "OK" = ValidAndVerified(fetchedLightBlocks[lh], fetchedLightBlocks[rh])
\/ "SUCCESS" = ValidAndVerified(fetchedLightBlocks[lh], fetchedLightBlocks[rh])

(**
* When the light client terminates, there are no failed blocks. (Otherwise, someone lied to us.)
Expand Down
8 changes: 4 additions & 4 deletions light-client/tests/model_based.rs
Original file line number Diff line number Diff line change
Expand Up @@ -115,8 +115,8 @@ fn model_based_test(
output_env.logln(" failed to find necessary programs; consider adding them to your PATH. skipping the test");
return;
}
env.copy_file_from_env(root_env, "Lightclient_A_1.tla");
env.copy_file_from_env(root_env, "Blockchain_A_1.tla");
env.copy_file_from_env(root_env, "Lightclient_002_draft.tla");
env.copy_file_from_env(root_env, "Blockchain_002_draft.tla");
env.copy_file_from_env(root_env, "LightTests.tla");
env.copy_file_from_env(root_env, &test.model);

Expand All @@ -128,6 +128,8 @@ fn model_based_test(
},
Err(e) => panic!("failed to run Apalache; reason: {}", e),
}
output_env.copy_file_from_env(env, "counterexample.tla");
output_env.copy_file_from_env(env, "counterexample.json");

let transform_spec = root_env
.full_canonical_path("_jsonatr-lib/apalache_to_lite_test.json")
Expand All @@ -143,8 +145,6 @@ fn model_based_test(
let tc: SingleStepTestCase = env.parse_file("test.json").unwrap();
println!(" > running auto-generated test...");
single_step_test(tc, env, root_env, output_env);
output_env.copy_file_from_env(env, "counterexample.tla");
output_env.copy_file_from_env(env, "counterexample.json");
}

fn model_based_test_batch(batch: ApalacheTestBatch) -> Vec<(String, String)> {
Expand Down
171 changes: 171 additions & 0 deletions light-client/tests/support/model_based/Blockchain_002_draft.tla
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
------------------------ MODULE Blockchain_002_draft -----------------------------
(*
This is a high-level specification of Tendermint blockchain
that is designed specifically for the light client.
Validators have the voting power of one. If you like to model various
voting powers, introduce multiple copies of the same validator
(do not forget to give them unique names though).
*)
EXTENDS Integers, FiniteSets

Min(a, b) == IF a < b THEN a ELSE b

CONSTANT
AllNodes,
(* a set of all nodes that can act as validators (correct and faulty) *)
ULTIMATE_HEIGHT,
(* a maximal height that can be ever reached (modelling artifact) *)
TRUSTING_PERIOD
(* the period within which the validators are trusted *)

Heights == 1..ULTIMATE_HEIGHT (* possible heights *)

(* A commit is just a set of nodes who have committed the block *)
Commits == SUBSET AllNodes

(* The set of all block headers that can be on the blockchain.
This is a simplified version of the Block data structure in the actual implementation. *)
BlockHeaders == [
height: Heights,
\* the block height
time: Int,
\* the block timestamp in some integer units
lastCommit: Commits,
\* the nodes who have voted on the previous block, the set itself instead of a hash
(* in the implementation, only the hashes of V and NextV are stored in a block,
as V and NextV are stored in the application state *)
VS: SUBSET AllNodes,
\* the validators of this bloc. We store the validators instead of the hash.
NextVS: SUBSET AllNodes
\* the validators of the next block. We store the next validators instead of the hash.
]

(* A signed header is just a header together with a set of commits *)
LightBlocks == [header: BlockHeaders, Commits: Commits]

VARIABLES
now,
(* the current global time in integer units *)
blockchain,
(* A sequence of BlockHeaders, which gives us a bird view of the blockchain. *)
Faulty
(* A set of faulty nodes, which can act as validators. We assume that the set
of faulty processes is non-decreasing. If a process has recovered, it should
connect using a different id. *)

(* all variables, to be used with UNCHANGED *)
vars == <<now, blockchain, Faulty>>

(* The set of all correct nodes in a state *)
Corr == AllNodes \ Faulty

(* APALACHE annotations *)
a <: b == a \* type annotation

NT == STRING
NodeSet(S) == S <: {NT}
EmptyNodeSet == NodeSet({})

BT == [height |-> Int, time |-> Int, lastCommit |-> {NT}, VS |-> {NT}, NextVS |-> {NT}]

LBT == [header |-> BT, Commits |-> {NT}]
(* end of APALACHE annotations *)

(****************************** BLOCKCHAIN ************************************)

(* the header is still within the trusting period *)
InTrustingPeriod(header) ==
now < header.time + TRUSTING_PERIOD

(*
Given a function pVotingPower \in D -> Powers for some D \subseteq AllNodes
and pNodes \subseteq D, test whether the set pNodes \subseteq AllNodes has
more than 2/3 of voting power among the nodes in D.
*)
TwoThirds(pVS, pNodes) ==
LET TP == Cardinality(pVS)
SP == Cardinality(pVS \intersect pNodes)
IN
3 * SP > 2 * TP \* when thinking in real numbers, not integers: SP > 2.0 / 3.0 * TP

(*
Given a set of FaultyNodes, test whether the voting power of the correct nodes in D
is more than 2/3 of the voting power of the faulty nodes in D.
*)
IsCorrectPower(pFaultyNodes, pVS) ==
LET FN == pFaultyNodes \intersect pVS \* faulty nodes in pNodes
CN == pVS \ pFaultyNodes \* correct nodes in pNodes
CP == Cardinality(CN) \* power of the correct nodes
FP == Cardinality(FN) \* power of the faulty nodes
IN
\* CP + FP = TP is the total voting power, so we write CP > 2.0 / 3 * TP as follows:
CP > 2 * FP \* Note: when FP = 0, this implies CP > 0.

(* This is what we believe is the assumption about failures in Tendermint *)
FaultAssumption(pFaultyNodes, pNow, pBlockchain) ==
\A h \in Heights:
pBlockchain[h].time + TRUSTING_PERIOD > pNow =>
IsCorrectPower(pFaultyNodes, pBlockchain[h].NextVS)

(* Can a block be produced by a correct peer, or an authenticated Byzantine peer *)
IsLightBlockAllowedByDigitalSignatures(ht, block) ==
\/ block.header = blockchain[ht] \* signed by correct and faulty (maybe)
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time >= 0 \* signed only by faulty

(*
Initialize the blockchain to the ultimate height right in the initial states.
We pick the faulty validators statically, but that should not affect the light client.
*)
InitToHeight ==
/\ Faulty \in SUBSET AllNodes \* some nodes may fail
\* pick the validator sets and last commits
/\ \E vs, lastCommit \in [Heights -> SUBSET AllNodes]:
\E timestamp \in [Heights -> Int]:
\* now is at least as early as the timestamp in the last block
/\ \E tm \in Int: now = tm /\ tm >= timestamp[ULTIMATE_HEIGHT]
\* the genesis starts on day 1
/\ timestamp[1] = 1
/\ vs[1] = AllNodes
/\ lastCommit[1] = EmptyNodeSet
/\ \A h \in Heights \ {1}:
/\ lastCommit[h] \subseteq vs[h - 1] \* the non-validators cannot commit
/\ TwoThirds(vs[h - 1], lastCommit[h]) \* the commit has >2/3 of validator votes
/\ IsCorrectPower(Faulty, vs[h]) \* the correct validators have >2/3 of power
/\ timestamp[h] > timestamp[h - 1] \* the time grows monotonically
/\ timestamp[h] < timestamp[h - 1] + TRUSTING_PERIOD \* but not too fast
\* form the block chain out of validator sets and commits (this makes apalache faster)
/\ blockchain = [h \in Heights |->
[height |-> h,
time |-> timestamp[h],
VS |-> vs[h],
NextVS |-> IF h < ULTIMATE_HEIGHT THEN vs[h + 1] ELSE AllNodes,
lastCommit |-> lastCommit[h]]
] \******


(* is the blockchain in the faulty zone where the Tendermint security model does not apply *)
InFaultyZone ==
~FaultAssumption(Faulty, now, blockchain)

(********************* BLOCKCHAIN ACTIONS ********************************)
(*
Advance the clock by zero or more time units.
*)
AdvanceTime ==
\E tm \in Int: tm >= now /\ now' = tm
/\ UNCHANGED <<blockchain, Faulty>>

(*
One more process fails. As a result, the blockchain may move into the faulty zone.
The light client is not using this action, as the faults are picked in the initial state.
However, this action may be useful when reasoning about fork detection.
*)
OneMoreFault ==
/\ \E n \in AllNodes \ Faulty:
/\ Faulty' = Faulty \cup {n}
/\ Faulty' /= AllNodes \* at least process remains non-faulty
/\ UNCHANGED <<now, blockchain>>
=============================================================================
\* Modification History
\* Last modified Wed Jun 10 14:10:54 CEST 2020 by igor
\* Created Fri Oct 11 15:45:11 CEST 2019 by igor
1 change: 0 additions & 1 deletion light-client/tests/support/model_based/Blockchain_A_1.tla

This file was deleted.

25 changes: 24 additions & 1 deletion light-client/tests/support/model_based/LightTests.tla
Original file line number Diff line number Diff line change
@@ -1,6 +1,29 @@
------------------------- MODULE LightTests ---------------------------

EXTENDS Lightclient_A_1
EXTENDS Lightclient_002_draft

(* The light client history, which is the function mapping states 1..nprobes to the record with fields:
- verified: the latest verified block in the previous state
- current: the block that is being checked in the previous state
- now: the time point in the previous state
- verdict: the light client verdict in the previous state
*)
VARIABLE
history

historyState ==
[ verified |-> prevVerified, current |-> prevCurrent, now |-> prevNow, verdict |-> prevVerdict ]

(* APALACHE annotations *)
a <: b == a \* type annotation

InitTest ==
/\ Init
/\ history = [ n \in {} <: {Int} |-> historyState ]

NextTest ==
/\ Next
/\ history' = [ n \in DOMAIN history \union {nprobes} |-> IF n = nprobes THEN historyState ELSE history[n]]

TestFailure ==
/\ state = "finishedFailure"
Expand Down
Loading