Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make algebra interfaces verified by default (#4739) #4841

Closed
wants to merge 1 commit into from

Conversation

nickdrozd
Copy link
Contributor

@nickdrozd nickdrozd commented Apr 6, 2020

Previously there were interfaces for Semigroup, Monoid, etc, that were
just syntactic in nature, requiring only that operations like <+> or
elements like neutral exist. There was a corresponding set of
"verified" interfaces requiring that the expected laws actually hold.
Thus Semigroup required that the <+> operation exist, while
VerifiedSemigroup required that that operation actually be
associative.

Such an arrangement is annoying for two reasons.

First, extra paperwork. If I want to show that something really is a
semigroup, I have to write two implementation blocks, the "plain"
version and the "verified" version. On the other side, if I want to
add a new interface, then in order to keep with the existing style I
need to write both a "plain" version and a "verified" version. In some
cases, like AbelianGroup, the "plain" interface is empty, since it
doesn't add any new syntactic elements. This is pointless.

Second, proof. Suppose a type has been declared to be a Semigroup, but
VerifiedSemigroup hasn't been implemented for it. What can be assumed
about that type? Is its <+> operation associative or not? If it isn't,
why call it a semigroup? If it is, why not prove it? And if it can't
be proved, how do we know that it's true? The big selling point of
Idris is its expressive type system and strong compile-time
guarantees, so let's take advantage of that and verify by default.

Fixing the interfaces is easy -- just move the "verified" requirements
into the "plain" interface, and that's it.

In some cases there already existed implementations of the "verified"
interfaces, and those could also easily be copied over. More
challenging are the cases for which there do not exist verified
implementations. There are four ways of dealing with them:

(1) Come up with the required proofs. This was done, for example,
with Semigroup a => Semigroup (Vect n a) etc in
contrib/Data/Matrix/Algebraic.idr, and also with easy ones like
Maybe and Endomorphism. A really nice one is the new
implementation of Ring t => Ring (Complex t).

(2) Assert without proof that the laws hold using believe_me. This
was done for all primitive types, including String and Integer,
since primitive types are not amenable to proof.

(3) Add postulates asserting that the laws hold. This was done for
nonprimitive datatypes for which I couldn't figure out proofs,
including Isomorphism and Morphism. I am pretty sure that the
laws hold for these cases.

(4) Revoke the name Semigroup, Monoid, etc. This was done for a few
data structures in contrib, like SortedMap and its descendants.
I am skeptical that these structures really are what they say
they are.

Note that updating an existing interface to be verified does not
require writing any proofs -- it can be done just by asserting that
the required properties hold, either with explicit postulates or with
belive_me (or even really_believe_me). There are a variety of examples
of updated implementations in this PR.

This change only applies to those interfaces that belong to "abstract
algebra": semigroups, monoids, groups, and so on. There is a whole
other set of "plain"/"verified" interfaces dealing with structures
from "category theory", like monad and functor. Those should also
undergo this unification, but I don't know how to do it just yet.

[updated]

@LeifW
Copy link
Contributor

LeifW commented Apr 7, 2020

Not everything can be proven in the type system, e.g. when dealing with external things like String, Int, Float, etc. Think I'd prefer postulate to believe_me for asserting those proofs. Example at https://stackoverflow.com/a/28008422

@nickdrozd
Copy link
Contributor Author

@LeifW Sounds good, I'll replace the believe_me invocations and the "unverified" interfaces with postulates.

@nickdrozd
Copy link
Contributor Author

@Sventimir

@melted
Copy link
Contributor

melted commented Apr 8, 2020

I don't see how we can merge this. It's a very invasive API change that will break a lot of code.

It has also been discussed and rejected before, as Idris is about allowing people to write verified code, not forcing them to (@edwinb can correct me on the formulation of that).

And finally, Idris 2 is where the action happens and any big changes should be done there.

I'll leave this open for a while for feedback, but I'd need an OK from @edwinb to merge it in any case.

@nickdrozd
Copy link
Contributor Author

Do you have any links to past discussions? All I could find was #4739, which doesn't have much.

Previously there were interfaces for Semigroup, Monoid, etc, that were
just syntactic in nature, requiring only that operations like <+> or
elements like neutral exist. There was a corresponding set of
"verified" interfaces requiring that the expected laws actually hold.
Thus Semigroup required that the <+> operation exist, while
VerifiedSemigroup required that that operation actually be
associative.

Such an arrangement is annoying for two reasons.

First, extra paperwork. If I want to show that something really is a
semigroup, I have to write two implementation blocks, the "plain"
version and the "verified" version. On the other side, if I want to
add a new interface, then in order to keep with the existing style I
need to write both a "plain" version and a "verified" version. In some
cases, like AbelianGroup, the "plain" interface is empty, since it
doesn't add any new syntactic elements. This is pointless.

Second, proof. Suppose a type has been declared to be a Semigroup, but
VerifiedSemigroup hasn't been implemented for it. What can be assumed
about that type? Is its <+> operation associative or not? If it isn't,
why call it a semigroup? If it is, why not prove it? And if it can't
be proved, how do we know that it's true? The big selling point of
Idris is its expressive type system and strong compile-time
guarantees, so let's take advantage of that and verify by default.

Fixing the interfaces is easy -- just move the "verified" requirements
into the "plain" interface, and that's it.

In some cases there already existed implementations of the "verified"
interfaces, and those could also easily be copied over. More
challenging are the cases for which there do not exist verified
implementations. There are four ways of dealing with them:

  (1) Come up with the required proofs. This was done, for example,
      with `Semigroup a => Semigroup (Vect n a)` etc in
      contrib/Data/Matrix/Algebraic.idr, and also with easy ones like
      Maybe and Endomorphism. A really nice one is the new
      implementation of `Ring t => Ring (Complex t)`.

  (2) Assert without proof that the laws hold using believe_me. This
      was done for all primitive types, including String and Integer,
      since primitive types are not amenable to proof.

  (3) Add postulates asserting that the laws hold. This was done for
      nonprimitive datatypes for which I couldn't figure out proofs,
      including Isomorphism and Morphism. I am pretty sure that the
      laws hold for these cases.

  (4) Revoke the name Semigroup, Monoid, etc. This was done for a few
      data structures in contrib, like SortedMap and its descendants.
      I am skeptical that these structures really are what they say
      they are.

Note that updating an existing interface to be verified does not
require writing any proofs -- it can be done just by asserting that
the required properties hold, either with explicit postulates or with
belive_me (or even really_believe_me). There are a variety of examples
of updated implementations in this PR.

This change only applies to those interfaces that belong to "abstract
algebra": semigroups, monoids, groups, and so on. There is a whole
other set of "plain"/"verified" interfaces dealing with structures
from "category theory", like monad and functor. Those should also
undergo this unification, but I don't know how to do it just yet.
@nickdrozd
Copy link
Contributor Author

@LeifW @melted @Sventimir Updated to do away with unverified interfaces, use explicit postulates for nonprimitive types, fill in a bunch more proofs, and fix a few errors.

@nickdrozd
Copy link
Contributor Author

@melted This change does not require anyone to write proofs. Instead, it's perfectly possible to write postulates or use believe_me to satisfy the interface requirements.

@Sventimir
Copy link
Contributor

Sventimir commented Apr 13, 2020

@nickdrozd I perfectly understand your point and agree with it with my whole heart! However, backwards compatibility is important too. Perhaps one of the reasons of success of Haskell is that its creators did not change things too dramatically at any point. Any wrong decision they made (and they made quite a lot of them), it stuck. Annoying as it may be, it's probably necessary if you want people to actually use the language.

Fortunately, Idris2 is already being developed and I believe we can fix this mistake in that release.

@nickdrozd
Copy link
Contributor Author

Proposal to do this

@ziman
Copy link
Contributor

ziman commented Apr 15, 2020

The proofs are cool and I can see you put a lot of work into this. However, I feel that there are some contradictions in the "merge the interfaces" part of the pull request.

First, extra paperwork. If I want to show that something really is a semigroup, I have to write two implementation blocks, the "plain" version and the "verified" version.

The two implementations are not really two versions of one thing. The implementation blocks are disjoint and they implement different things; one implements the operations, the other implements the properties.

Regarding the paperwork, an extra implementation block costs you 1) one blank line plus 2) one implementation header line. So you complain about having to add two lines of code, but at the same time you propose forcing everyone else to add (many lines of) proofs in their code. Are mandatory proof obligations not extra paperwork with size significantly greater than two lines of code for each verified interface?

In some cases, like AbelianGroup, the "plain" interface is empty

I don't think you would add a "plain" interface for abelian groups. Why not VerifiedGroup g => VerifiedAbelianGroup g?

More generally, Group implements the operations, VerifiedGroup and VerifiedAbelianGroup implement the properties. If you have multiple options for properties of the same set of operations, you'll have multiple property interfaces.

Second, proof. Suppose a type has been declared to be a Semigroup, but VerifiedSemigroup hasn't been implemented for it. What can be assumed about that type?

Is its <+> operation associative or not?

It's called "semigroup" so the library author wants you to think about it as associative. However, the lack of a formal proof says that it's either hard to prove, or noone has done it yet, or maybe it's false.

If it is, why not prove it?

For the same reason that your pull request is full of believe_mes.

The big selling point of Idris is its expressive type system and strong compile-time guarantees, so let's take advantage of that and verify by default.

I don't believe you should do things just because something lets you do them. And I'm quite sure you should not force other people to do these things.

Verification is costly, so you usually want to find the best balance of verification and productivity. I find that usually the best approach is to pick an invariant that is a frequent source of bugs, have the type system keep track of it for you, and let the rest be.

(2) Assert without proof that the laws hold using believe_me.

semigroupOpIsAssociative = believe_me Integer

Oh, this is not the right way to use believe_me. You want postulates here.

believe_me is a coercion, so your declaration above says "the runtime representation of Integer can be interpreted as the runtime representation of a proof of associativity of this semigroup". The runtime representation of Integer is the null pointer and if you ever try to inspect this proof in compiled code, my guess is that your program will segfault. (I haven't tried it, though.)

I'll take an unverified non-segfaulting program over a verified segfaulting one anytime. ;)

I get what you're saying, though; I suggest using postulates for this.

(3) Add postulates asserting that the laws hold. This was done for nonprimitive datatypes for which I couldn't figure out proofs, including Isomorphism and Morphism. I am pretty sure that the laws hold for these cases.

By the nature of your proposal, I'd expect that you would be the first person to reject "I am pretty sure" in lieu of a formal proof.

The big selling point of Idris is its expressive type system and strong compile-time guarantees

...so let's not subvert them by postulating stuff haphazardly!

(4) Revoke the name Semigroup, Monoid, etc. This was done for a few data structures in contrib, like SortedMap and its descendants. I am skeptical that these structures really are what they say they are.

Once these structures are filled with believe_mes, will you be less skeptical? :)

Names are are meant to be suggestive. Proofs are meant to be reliable (and, rightly, do not depend on how you name things). The name Semigroup gives intuition to the human but does not affect the formal properties of the interface. So you can't use the semigroup-ness in formal proofs, unless you really prove it.

Thus I see nothing wrong in naming something Semigroup (that you honestly believe to be a semigroup), while I find it wrong to provide blanket formal proofs of semigroupness.

Note that updating an existing interface to be verified does not require writing any proofs -- it can be done just by asserting that the required properties hold, either with explicit postulates or with belive_me (or even really_believe_me). There are a variety of examples of updated implementations in this PR.

What's the point of having proofs at all if you're going to fill them with assertions by default, anyway?

To sum up, the proposal is to have verified properties included in interfaces because:

  • "First, extra paperwork."
  • "The big selling point of Idris is its expressive type system and strong compile-time guarantees, so let's take advantage of that and verify by default."

I feel that the two points above are in contradiction with the later parts of the same proposal:

  • that we force everyone to, at the very least, add several mandatory lines of postulates to their implementations, adding a significant amount of cruft for everyone everywhere
  • that these properties be implemented by bogus proofs, which prove nothing; they just subvert the "expressive type system and strong compile-time guarantees". Adding these "proofs" does not make the system more correct. But it makes the system say about itself that it is more correct.

Saying something that you can't rely on is worse than saying nothing. And if this landed in the stdlib, you could never be sure that any proof based on this is sound.

You've implemented some cool proofs, though; those about the structure of the complex numbers look quite neat, so even though I can't back the interface merging proposal, I suggest that you pack the proofs up and submit them in the stdlib. They'd be nice to have!

@andrevidela
Copy link
Contributor

I support the arguments listed before to not merge this into master namely:

  • Idris is a language with strong type garantees and encouraging users to use escape hatches like believe_me or really_believe_me (specially when it's misused like in this PR) goes against this core principle
  • Idris is a programming language first, users should be able to write programs before writing proofs. Requiring users to write proofs before their program runs is counter to this core principle
  • Idris is a programming language first, this change adds an extra cost in development time and in learning curve, for user coming from existing programming languages and expecting a programming language. Though this isn't necessarily counter to the design goals of Idris, it definitly eats into its transition costs.

@nickdrozd
Copy link
Contributor Author

@ziman

If it is, why not prove it?

For the same reason that your pull request is full of believe_mes.

The believe_mes are only there for primitive types, for which
properties cannot be proved.

I don't believe you should do things just because something lets you
do them. And I'm quite sure you should not force other people to do
these things.

I don't think anybody needs to be forced to prove anything. Likewise,
nobody is forced to claim that any type is a semigroup. Proofs would
remain opt-in because the interfaces are also opt-in. Or is that not
that case? One thing I would still like to see is some real-world
implementations of these interfaces. Does anybody have any links???

semigroupOpIsAssociative = believe_me Integer

Oh, this is not the right way to use believe_me. You want
postulates here.

believe_me is a coercion, so your declaration above says "the
runtime representation of Integer can be interpreted as the
runtime representation of a proof of associativity of this
semigroup". The runtime representation of Integer is the null
pointer and if you ever try to inspect this proof in compiled code,
my guess is that your program will segfault. (I haven't tried it,
though.)

I'll take an unverified non-segfaulting program over a verified
segfaulting one anytime. ;)

I get what you're saying, though; I suggest using postulates for this.

Yeah, I'm not wed to believe_me. Mainly it saved me some (keyboard)
typing, but I would be happy to change it. Is there documentation
somewhere that goes into detail about the uses and relative merits of
believe_me, really_believe_me, and postulate?

By the nature of your proposal, I'd expect that you would be the
first person to reject "I am pretty sure" in lieu of a formal proof.

Not at all! Stating a postulate makes it explicit in the code that
while there is no proof available, the programmer testifies that it is
true. Remember, a statement can be true despite being unproven, or
even unprovable.

The current system makes the same assurance, but implicitly. If the
docstring says "The following laws must hold...", then the programmer
is implicitly asserting that those laws hold by virtue of having
implemented that interface.

As the Zen of Python says, "Explicit is better than implicit." :)

(4) Revoke the name Semigroup, Monoid, etc. This was done for a
few data structures in contrib, like SortedMap and its descendants.
I am skeptical that these structures really are what they say they
are.

Once these structures are filled with believe_mes, will you be
less skeptical? :)

Well, it would mean that whoever added those believe_mes was
personally testifying that those laws hold to their best judgment.
Mathematicians makes conjectures all the time, and they prove further
results assuming unproven conjectures. There have been times where I
myself have said "I'm sure this is true, but I can't remember how the
proof worked, so just take my word for it for now." believe_me /
postulate makes that explicit in the code. (Again, if believe_me
is not the appropriate operator to use for what I'm describing, I
would be happy to change all of them.)

In the particular case of SortedMap, I attempted to add postulates
stating the required properties, and I couldn't get the type-checker
to accept even those. That is a RED FLAG to me that something is
wrong, and I wouldn't want to put my name on that believe_me.
Contrast that with the postulate for Morphism, which is clearly true
but apparently unprovable, because of something about function
extensionality. In that case, I, Nick Drozd, would be willing to put
my name on that postulate.

What's the point of having proofs at all if you're going to fill
them with assertions by default, anyway?

I don't actually advocate doing that; I only did it here so as to not
break any existing code (see the concerns of earlier reviewers). That
is a point of compromise, and it could be done differently.

Saying something that you can't rely on is worse than saying
nothing. And if this landed in the stdlib, you could never be sure
that any proof based on this is sound.

A certain amout of trust is always required. How do we know the
type-checker itself is sound? Let's hope that the type-checker can't
prove that the type-checker is sound, because that would have some
unpleasant consequences...

You've implemented some cool proofs, though; those about the
structure of the complex numbers look quite neat, so even though I
can't back the interface merging proposal, I suggest that you pack
the proofs up and submit them in the stdlib. They'd be nice to have!

That's fair. I'll get the nonbreaking changes together in a separate
PR. This one can stay open as a reference for further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants