Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to TLA+ specs for model-based testing #546

Merged
merged 16 commits into from
Sep 24, 2020

Conversation

andrey-kuprianov
Copy link
Contributor

@andrey-kuprianov andrey-kuprianov commented Sep 2, 2020

Part 1 of #529. See also #414. Changes to the TLA+ specs needed for model-based testing

  • introduced history variable into the LightClient spec

  • fixed a couple of divergences between the spec and the implementation, that resulted in failing model-based tests.

  • Referenced an issue explaining the need for the change

  • Updated all relevant documentation in docs

  • Updated all code comments where relevant

  • Wrote tests

  • Updated CHANGELOG.md

@andrey-kuprianov andrey-kuprianov added light-client Issues/features which involve the light client verification tests labels Sep 2, 2020
@andrey-kuprianov andrey-kuprianov self-assigned this Sep 2, 2020
@andrey-kuprianov
Copy link
Contributor Author

Here is the relevant reviews from the original PR: #529 (review)

@andrey-kuprianov andrey-kuprianov changed the title #414 changes to TLA+ specs for model-based testing Changes to TLA+ specs for model-based testing Sep 2, 2020
@andrey-kuprianov
Copy link
Contributor Author

The discussion from the original PR:

@josef-widder

I think OK and CANNOT_VERIFY are used in the English spec. Are SUCCESS and NOT_ENOUGH_TRUST used in the implementation? If yes, we should adapt I guess the English spec.

@andrey-kuprianov

I've taken the verdicts from the implementation. There is another issue I've got, namely that according to the TLA+ constraints the verdict FAILED_TRUSTING_PERIOD cannot ever be produced. At least it's impossible to obtain a test with such verdict from Apalache, and my quick manual check on the TLA+ constraints tells me it's indeed logically impossible. Could you please verify that? In case it's true, it may be makes sense to remove such a verdict. And I would actually update the English spec as well, as I find the implementation names a bit more informative

@konnov

I don't see why FAILED_TRUSTING_PERIOD cannot be produced. It is produced in line 115. However, it is not propagated into state. Do you like to see it in state?

@andrey-kuprianov
Copy link
Contributor Author

The problem I have is with the following model-based test:

\* This test never produces a counterexample; so the model should be corrected
TestFailedTrustingPeriod ==
   \E s \in DOMAIN history :
      history[s].verdict = "FAILED_TRUSTING_PERIOD"

@andrey-kuprianov
Copy link
Contributor Author

@konnov The model is such that this verdict is logically excluded

@konnov
Copy link
Contributor

konnov commented Sep 11, 2020

@konnov The model is such that this verdict is logically excluded

We have to investigate this. Can it be that this verdict is overwritten by another block later? Since you are only checking the history post-factum, this can happen.

@andrey-kuprianov
Copy link
Contributor Author

I am pretty sure it is excluded by the combination of logical constraints in the model... yes, let's investigate.

@@ -110,7 +110,7 @@ FaultAssumption(pFaultyNodes, pNow, pBlockchain) ==
(* Can a block be produced by a correct peer, or an authenticated Byzantine peer *)
IsLightBlockAllowedByDigitalSignatures(ht, block) ==
\/ block.header = blockchain[ht] \* signed by correct and faulty (maybe)
\/ block.Commits \subseteq Faulty /\ block.header.height = ht \* signed only by faulty
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time > 0 \* signed only by faulty
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time > 0 \* signed only by faulty
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time >= 0 \* signed only by faulty

/\ thdr.time <= uhdr.time
/\ thdr.time < uhdr.time
\* the untrusted block is not from the future
/\ uhdr.time <= now
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/\ uhdr.time <= now
/\ uhdr.time < now + CLOCK_DRIFT

@andrey-kuprianov andrey-kuprianov linked an issue Sep 11, 2020 that may be closed by this pull request
Copy link
Contributor

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading the PR for my own edification and to start to get some exposure to our TLA specs.

I offered one naive suggestion about wording asked one question to help my reading of the code :)

@@ -75,7 +75,7 @@ LBT == [header |-> BT, Commits |-> {NT}]

(* the header is still within the trusting period *)
InTrustingPeriod(header) ==
now <= header.time + TRUSTING_PERIOD
now < header.time + TRUSTING_PERIOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the <= actually appears to contradict the english spec which says

 - *LightStore* always contains a verified header whose age is less than the
 trusting period,  

So it seems clear that what we ant is "less than", but perhaps this inconsistency was due in part to the wording and terminology here. E.g., in this paragraph

In order to ensure liveness, LightStore always needs to contain a
verified (or initially trusted) header whose time is within the
trusting period.

"trust period" is spoken of as a duration that the header needs to be within not as a limit point that the header time needs to be less than. I wonder if calling this TRUST_PERIOD_LIMIT might help prevent such confusion i the future. I suspect a problem with the current wording may arise due to "trusting period" sounds like a span within which things are valid, when it's actually meant to be the outer limit beyond which things are invalid. iiuc, the trust period is that span of time up to but not including the limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right: <= does contradict the English spec, so this is the TLA+ modeling artifact. That there can be ambiguities in the English spec, that's of course possible -- English is not a formal language after all;) I still think that our English spec does an amazing job of clarifying the requirements at the semi-formal (and as close to formal as possible) level. But exactly because of this non-strictness of the English language the TLA+ spec should, in my opinion, be the ultimate source of truth for all parties.

One of the things we were discussing with @josef-widder and @konnov is that now, when we have the three artifacts: the English spec, the TLA+ spec, and the implementation, and also now that we have the MBT, which serves as the "missing link" helping to synchronize them, we need to establish a process (in the VDD sense) that will help us to resolve the conflicts.

In this PR I've fixed the TLA+ spec, because it was the most easy way for me to go, but it's not clear at all whether that's the right thing to do. We need to figure out a joint understanding between the research and the development teams on how to proceed in such cases.

Copy link
Member

@ebuchman ebuchman Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, given the larger context, I don't believe it matters much whether we use < or <= here since we are dealing with time and already assume clocks can drift some amount. The important thing is that TRUST_PERIOD is a fraction of the UNBONDING_PERIOD. This ensures that light clients have the duration of UNBONDING_PERIOD - TRUST_PERIOD to try and submit evidence to the chain and get faulty validators punished. And since all these periods are on the order of days/weeks, the < vs. <= here shouldn't make a difference in the end.

Also while UNBONDING_PERIOD is a global consensus parameter (all nodes agree on), the TRUST_PERIOD is not, it's a local subjective parameter that clients can set depending on their risk tolerance. So again, the plank second difference between < and <= shouldn't matter there.

Of course that might be an unsatisfactory answer - I don't mean to be flippant about it and we should certainly be consistent and precise, but it's important to keep the practical context in mind so we know what details are most important to focus on. Of course this is just input from my feeble human mind so possibly machine will prove that it does matter, but thought it was worth providing the extra context here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I totally agree, from the practical perspective these differences between < and <= do not matter at all in this particular context, so this case we could fix however we like, and it would make no real difference. There are two things to keep in mind though:

  • scientifically, we can't be approximately precise. And I would like to think about our specs as being precise, so we should clarify and fix any ambiguities.
  • this particular example is only a manifestation of a general problem about the artifacts we are having, and I am sure we will encounter more instances of it, with probably higher level of severity. So it's better if we are prepared for that.

Also I tend to perceive us having this problem as rather a good thing, demonstrating that from now on we will have a living specification, which is in sync with the implementation.

This is only the input from another feeble human mind, just a bit different;) And I am sure fixing this will not take much of either our focus or time.

- verdict: the light client verdict for the `current` block in this state
*)
VARIABLE
history
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question for my learning, I don't see the history function actually getting used anywhere except on https://github.com/informalsystems/tendermint-rs/pull/546/files#diff-04b72e322e067c19d149fbabcfe5a830R272 where is is UNCHANGED. Could you maybe point out to me where we actually apply the function to make use of the records in its codomain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used here:

It was part of another PR, which is already merged.

Copy link
Contributor Author

@andrey-kuprianov andrey-kuprianov Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as we discussed with @josef-widder and @konnov, the way I introduce the history variable needs to be changed, because this feature is not needed for all users of the spec. So we need to figure out how to modularize the spec, and to allow to add the history as an extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Thanks very much for the pointers!

@andrey-kuprianov
Copy link
Contributor Author

andrey-kuprianov commented Sep 22, 2020

@josef-widder @konnov Here is the logical explanation why "FAILED_TRUSTING_PERIOD" verdict is impossible:

For it to be issued the following should hold:

  • ValidAndVerifiedPre(trusted, untrusted)
  • ~BC!InTrustingPeriod(untrusted.header)

Denoting trusted.header.time as t, and untrusted.header.time as u, we get these constraints

  • now < t + TRUSTING_PERIOD
  • t < u
  • u <= now
  • now >= u + TRUSTING_PERIOD

which, combined, give this inequality:

  • u <= now - TRUSTING_PERIOD < t < u <= now

which is impossible.

@andrey-kuprianov
Copy link
Contributor Author

To be more precise, this is impossible taking into account the changes to the specs I propose, because those changes make strict inequalities out of two non-strict ones. Without those changes this verdict would be possible with a single valuation, namely:

  • u = now - TRUSTING_PERIOD = t = u

@konnov
Copy link
Contributor

konnov commented Sep 22, 2020

Looks like we have to add clock drift, to make sense of error condition.

@andrey-kuprianov
Copy link
Contributor Author

yes, as we've discussed here

Also, this will make a lot of sense from the implementation and testing points of view, as the implementation does have it

@andrey-kuprianov
Copy link
Contributor Author

@konnov @josef-widder I've implemented our discussions, in particular factored out building the history chain from the main LightClient spec into the testing spec. Could you please review the changes to *.tla files in this PR?

For now this is still shown as the changes to the TLA files from docs/spec, to facilitate the review. After you review them, I will rename them, as we discussed, as Blockchain_0_0_2_draft.tla and Lightclient_0_0_2_draft.tla, place into the local LightClient folder for MBT-purposes, and open an issue against tendermint/spec repo.

Thanks!

@josef-widder
Copy link
Member

Files names should be like Blockchain_002_draft.tla.

konnov
konnov previously approved these changes Sep 24, 2020
Copy link
Contributor

@konnov konnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. We should follow up with adding a local clock and CLOCK_DRIFT in tendermint/170.

docs/spec/lightclient/verification/Lightclient_A_1.tla Outdated Show resolved Hide resolved
docs/spec/lightclient/verification/Lightclient_A_1.tla Outdated Show resolved Hide resolved
docs/spec/lightclient/verification/Lightclient_A_1.tla Outdated Show resolved Hide resolved
josef-widder
josef-widder previously approved these changes Sep 24, 2020
Copy link
Member

@josef-widder josef-widder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks!

@andrey-kuprianov andrey-kuprianov merged commit 856084e into master Sep 24, 2020
@andrey-kuprianov andrey-kuprianov deleted the andrey/mbt-spec-changes branch September 24, 2020 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
light-client Issues/features which involve the light client tests verification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Develop a set of model-based tests for the LightClient
5 participants