Kwxm/bls12-381/prototype (PLT-192, PLT-1557, PLT-1554, etc). #5231

kwxm · 2023-03-24T01:19:12Z

Introduction

This PR adds new built-in types and functions to Plutus Core and PlutusTx for pairings over BLS12-381, as specified in CIP-0381. These depend on Haskell bindings to the blst C library which are currently the subject of an open PR in cardano-base. In order not to depend on a branch, the code for these has been coped into plutus-core/cbits and plutus-core/src/plutus-core/Crypto/External. We should be able to merge this PR as-is, but ideally we'd wait until the cardano-base PR is merged and remove the copied files from plutus-core.

Details

The PR introduces three new PLC types:

bls12_381_G1_element
bls12_381_G2_element
bls12_381_MlResult

and seventeen new built-in functions:

bls12_381_G1_add                : [ bls12_381_G1_element, bls12_381_G1_element ] -> bls12_381_G1_element
bls12_381_G1_neg                : [ bls12_381_G1_element ] -> bls12_381_G1_element
bls12_381_G1_scalarMul          : [ integer, bls12_381_G1_element ] -> bls12_381_G1_element
bls12_381_G1_equal              : [ bls12_381_G1_element, bls12_381_G1_element ] -> bool
bls12_381_G1_hashToGroup        : [ bytestring, bytestring ] -> bls12_381_G1_element
bls12_381_G1_compress           : [ bls12_381_G1_element ] -> bytestring
bls12_381_G1_uncompress         : [ bytestring ] -> bls12_381_G1_element

bls12_381_G2_add                : [ bls12_381_G2_element, bls12_381_G2_element ] -> bls12_381_G2_element
bls12_381_G2_neg                : [ bls12_381_G2_element ] -> bls12_381_G2_element
bls12_381_G2_scalarMul          : [ integer, bls12_381_G2_element ] -> bls12_381_G2_element
bls12_381_G2_equal              : [ bls12_381_G2_element, bls12_381_G2_element ] -> bool
bls12_381_G2_hashToGroup        : [ bytestring, bytestring ] -> bls12_381_G2_element
bls12_381_G2_compress           : [ bls12_381_G2_element ] -> bytestring
bls12_381_G2_uncompress         : [ bytestring ] -> bls12_381_G2_element

bls12_381_millerLoop            : [ bls12_381_G1_element, bls12_381_G2_element ] -> bls12_381_mlresult
bls12_381_mulMlResult           : [ bls12_381_mlresult, bls12_381_mlresult ] -> bls12_381_mlresult
bls12_381_finalVerify           : [ bls12_381_mlresult, bls12_381_mlresult ] -> bool

The basic definitions of all of these things are in plutus-core/plutus-core/src/PlutusCore/Crypto/BLS12_381/, in the files G1.hs, G2.hs, and Pairing.hs. The following sections will attempt to give an idea of what all of these do, but some mathematical background will be required first.

Background

Abelian groups

An abelian group is a commutative monoid in which every element has an inverse. These are usually written in an additive style where the operation is +, the identity element is 0, and the inverse of an element a is -a, but they are sometimes written in a multiplicative style where the operation is *, the identity element is 1, and the inverse of an element a is a⁻¹. The types G1 and G2 above represent finite additive abelian groups and MlResult represents a multiplicative one.

For any abelian group one can define a notion of scalar multiplication by integers. Given an element a of an abelian group and an integer n, n*a is defined to be 0 if n=0, a+...+a n times if n>0, and (-a) + ... + (-a) -n times if n<0. In multiplicative notation this scalar "multiple" is written as a power aⁿ.

The add, neg, and scalarMul operations above represent these operations in the groups G1 and G2; in addition there is an equality operation, serialisation (compress) and deserialisation (uncompress) operations, and a hashToGroup operation which takes an arbitrary bytestring and a domain separation tag and produces a hash in the form of an element of G1 or G2.

This leaves three functions: mulMlResult, which performs group multiplication in the group MlResult; millerLoop, which takes an element of G1 and an element of G2 and produces an element of MlResult (this is the actual pairing operation), and finalVerify, which essentially compares two elements of MlResult for equality. The last two operations (millerLoop and finalVerify) are very expensive and are typically only expected to be called a few times in on-chain code.

To explain more of the details of the new functions we need some extra background.

Finite fields

A field F is a set equipped with two operations + and *and elements 0 and 1 such that

(F, 0, +) is an abelian group
(F, 1, *) is a commutative monoid such that every nonzero element of F has a multiplicative inverse (so F\{0} is an abelian group under multiplication).
Multiplication is distributive over addition: a*(b+c) = a*b + a*cfor all a,b,c in F.

Familiar examples of fields are ℚ, ℝ, and ℂ (but not ℤ since not every element has a multiplicative inverse in ℤ).

There are also finite fields, and these can be completely classified. It can be shown that for every prime number p and every positive integer n there is a field of order pⁿ (the order of a field or group is its cardinality, the number of elements it contains), and these account for all of the finite fields up to isomorphism. These fields are usually denoted by 𝔽_{pⁿ} or GF(pⁿ) or GF(p,n) or something similar; I'll use GF(pⁿ) because it's easier to type.

For n=1, GF(p) is just ℤₚ with the usual operations, the important point being that every nonzero element there has a multiplicative inverse modulo p, unlike ℤₙ if n is not prime (what would be the inverse of 6 in ℤ₂₄, for example?).

For n>1, we can regard GF(p,n) as consisting of n-tuples of elements of ℤₚ; addition is usual coordinatewise addition, but multiplication will be something much more convoluted (and expensive), where the i-th entry in a product will be a complicated combination of all of the elements of the two tuples being multiplied.

Elliptic curves

For the purposes of this PR, an elliptic curve E over a field F can be regarded as an equation of the form y² = x³+ax+b, where a and b are constants from F. The set of all pairs (x,y) ∈ F×F which satisfy the equation is also referred to as "the curve". When the field is ℝ an elliptic curve is what you'd usually think of as a curve (see Google), but when F is a finite field the so-called curve is an apparently random set of pairs of field elements. However, although the geometric structure has vanished, there is an extremely rich algebraic structure that still makes sense.

A fundamental property of elliptic curves is that they are also abelian groups: given two points P and Q on an elliptic curve E one can define another point P+Q which also lies on the curve. There's a geometric picture for this over ℝ (see Google again), but also an algebraic form which applies over any field. The coordinates of P+Q are complicated algebraic functions of the coordinates of P and Q. In particular, if you're given a point P and the point nP (= P+...+P) for some integer n, it's not easy to work out what n is: this is the discrete logarithm problem on E and is believed to be computationally intractable in general, which is one reason why elliptic curves are useful in cryptography.

Pairings

A pairing is a function e: G₁ × G₂ → G_T where G₁, G₂ and G_T are abelian groups (the T's supposed to be a subscript, but I can't find a Unicode character for that). In our case, G₁ and G₂ are written additively, but to complicate matters, G_T is written multiplicatively. A pairing is require to be bilinear: for p, p' ∈ G₁, q, q' ∈ G₂, and n ∈ ℤ we should have

e(p+p',q) = e(p,q) * e(p',q) (remember that the operation in G_T is multiplication)
e(p,q+q') = e(p,q) * e(p,q')
e(n*p,q) = e(p,q)ⁿ = e(p,n*q)

The final property here is one reason that the concept of pairing useful in cryptography: it allows one person to perform a computation in G₁ and another to perform a computation in G₂ and then the results can be compared in G_T without having to share information (or at least that's my simplistic interpretation: the details are complicated).

Back to Plutus

The situation in BLS12-381 is as follows:

Everything happens over a field of order q where is a 381-bit prime (approximately 10^115)
G₁ is a subgroup of an elliptic curve E₁ defined over GF(q)
G₂ is a subgroup of an elliptic curve E₂ defined over GF(q²)
G_T is a subgroup of the multiplicative group of nonzero elements of GF(q¹²) (the elements of this field are huge: you need 12x48 = 576 bytes to hold a single element)
G₁, G₂, and G_T are all isomorphic to each other and are of order p where p is a 255-bit prime (about 10^77). In fact, all of these groups are isomorphic to (ℤₚ, +), but in very obfuscated ways.
It is also the case that p*a = 0 for all a ∈ G₁, and similarly for G₂ , so scalar multiplication is periodic modulo p. This means that the efficiency of computing a scalar multiple n*a can be improved by reducing n modulo p before performing that multiplication (although scalar multiplication is still quite expensive since we've got up to 255 bits to multiply by). The reduction is performed automatically by the scalarMul functions.
There is a pairing e: G₁ × G₂ → G_T, and combining millerLoop and finalVerify allows one to check whether e(a,b) = e(a',b') for elements a,a' ∈ G₁ and b,b' ∈ G₂. The fact that such a pairing exists is not obvious and depends on some rather complicated mathematics, which itself has undergone a lot of refinement to enable efficient computation.

Serialisation and compression

We can now explain how elements of G₁ and G₂can be serialised as bytestrings.

It follows from the facts above that every element of G₁ can be represented as a pair (x,y) with x,y ∈ GF(q), and each of x and y take up 381 bits, so each fits into 48 bytes with 3 bits left over. Thus (x,y) pairs can be serialised in this form, a pair taking up a total of 96 bytes. However, this is inefficient in terms of space: remember that the point satisfy an equation y² = x³ + ax +b, so if you know the x-coordinate of a point and can compute square roots in GF(q) then you can calculate the y-coordinate of the point and save yourself 48 bytes (with some complications that we'll see later). A similar situation holds for elements of G₂, except that each coordinate requires 96 bytes. There are thus two possibilities for how group elements can be serialised into bytestrings:

You can write both coordinates and then when you deserialise you read them in and check that the x and y values satisfy the equation of the curve.
You can just save the x coordinate and calculate the y coordinate from that using the equation of the curve. However there are two more complications here
- Not every point in GF(q) has a square root, so your serialised value may not correspond to a point on the curve.
- The nonzero points which have square roots have two distinct square roots, so you have a choice of two points on the curve. One of the three spare bits in the 48-byte representation is used to decide which point you want.
A further complication is that the zero point on the curve is actually an extra point called "the point at infinity" that effectively has an infinite y coordinate. Another of the spare bits is used to indicate the encoding of the point at infinity.

The blst library supports both of these formats, using the final spare bit to indicate which serialisation method is intended (see the specification here) (the "spare" bits are the most significant (leftmost) bits in the bytestring encoding) . The first method is called "serialisation" and the second "compression". There's a trade-off here: the serialised format is space-inefficient but deserialisation is fast, whereas for the compressed format only half the space is required but uncompression is slower because of the need to extract a square root. In fact, experiments show that uncompression takes about 20 times longer than deserialisation for G₁ and about 30 times longer for G₂. Despite this we have chosen to use the compressed format in Plutus Core, providing uncompression functions bls12_381_G1_uncompress and bls12_381_G1_uncompress, along with corresponding compression functions. We may wish to rethink this at some point, possibly adding serialisation and deserialisation if the time penalty for uncompression is too great.

Our representation is more restrictive than that of blst: blst allows serialisation of any point on the curve E₁ (or E₂), but during uncompression we reject any points which do not lie in the subgroup G₁ (or G₂). The subgroups are much smaller than the entire curve group (|E₁|/|G₁| ≈ 7×10^37 and |E₂|/|G₂| ≈ 3×10^152) so the chance of being able to successfully decompress a random bytestring to get a point in a subgroup is negligible.

Concrete syntax for `G₁` and `G₂` elements.

We have also used the compressed representation for the concrete syntax for G₁ and G₂ elements. A constant of type G₁ is represented by (con bls12_381_G1_element 0x...) where ... consists of 96 hex digits, and a G₂ element is similar but with 192 hex digits.

Hashing

We also provide two functions bls12_381_G1_hashToGroup and bls12_381_G2_hashToGroup which, as the names suggest, take an arbitrary bytestring and produce an element in the relevant group (not just on the curve). Hashing and uncompression are the only means of obtaining group elements directly. The other group operations preserve the property of being in the subgroup, so none of our functions can can give us arbitrary curve elements.

Pairing operations and `bls12_381_mlresult`

In the BLS12-381 situation, what one usually wishes to do is to calculate two pairings to get two elements of G_T and then compare these results for equality. This is an expensive computation and it is considerably more efficient (though still expensive) to calculate values of a so-called non-reduced pairing in an intermediate group closely related to G_T and then at the end perform a calculation (final verification) that checks that two elements of the intermediate group become equal when lifted into G_T.

In Plutus Core, the intermediate group is called bls12_381_mlresult, and the operations we can perform on this type are (deliberately) very limited. The only way to obtain a value of type bls12_381_mlresult is as a result of calling bls12_381_millerLoop (hence the name), or by multiplying together two existing elements of that type. We do not provide any concrete syntax, textual printing method, or serialisation format (including flat) for values of type bls12_381_mlresult.

Note also that the operations on bls12_381_mlresult are quite expensive: the (constant) CPU costs are (currently) as follows:

 bls12_381_millerLoop: 402099373
 bls12_381_mulMlResult: 2544991
 bls12_381_finalVerify: 388656972

Fortunately this is in line with the expected on-chain usage patterns: in realistic applications one would tend to perform a few calls to millerLoop, some multiplications, and one call to bls12_381_finalVerify, and costing benchmarks (cabal run bls-costs) suggest that this can be done in a reasonable time. For example, a realistic zk-snark verification takes about 23% of the on-chain script execution limit (memory costs are negligible).

`flat` serialisation

The canonical storage format for Plutus Core scripts is the flat format (see Appendix E of the Plutus Core specification (pdf here)). We use the compression format described earlier for flat serialisation of elements of type bls12_381_G1_element and bls12_381_G2_element: these are compressed to bytestrings which are then encoded using the normal flat encoding (but with tags 9 and 10 respectively). Deccoding can fail for a number of reasons: the bytestring can be the wrong length (it must be exactly 48 bytes long for bls12_381_G1_element and 96 for bls12_381_G2_element); the encoding can be invalid because the three format bits are used improperly; the encoding may specify a valid field element but one which is not the x-coordinate of a point on the curve ("off-curve"); and the encoding may specify a genuine point on the curve but one which is not in the relevant subgroup ("out-of-group"). All of these are described in the BLSTError type defined in the blst Haskell bindings.

There is no means of serialising bls21_381_mlresult values. The tag 11 has been allocated to this type, but if it is encountered while decoding a flat file then a failure occurs.

Costing

Fortunately costing is quite simple for the BLS12-381 operations: costing benchmarks indicate that everything is constant cost apart from the hashing functions and the scalar multiplication functions, which are linear in the size of their input.

…/plutus into kwxm/BLS12_381/prototype

kwxm · 2023-05-26T18:54:20Z

Thanks for the comments @effectfully, especially the one about has collisions. I think I've addressed all of them except for the one about explicit dictionaries: I might come back to that later.

kwxm · 2023-05-30T12:36:20Z

@angerman I think this PR is blocked because you requested changes because of the Windows build problem. I believe we decided that you'd look into that separately, so is it OK to merge this now? I had some trouble merging the nix files with those in master (but eventually managed to get things into a state where nix develop worked), so you might want to check that they look reasonable.

angerman · 2023-05-31T05:02:04Z

@kwxm here's the haskell.nix bump PR: #5366

I've also tried to just bump haskell.nix on this PR, but that resulted in

       > untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs:284:29: error:
       >     • Variable not in scope:
       >         asum
       >           :: [Maybe (Term name1 uni1 fun1 ann)]
       >              -> Maybe (Term name1 uni1 fun1 ann)
       >     • Perhaps you meant ‘sum’ (imported from Prelude)
       >     |
       > 284 |         Constr _ _ ts    -> asum $ goTerm <$> ts
       >     |

after I pulled the latest changes 😕

angerman · 2023-05-31T09:13:50Z

So this fails on windows with

[168 of 171] Compiling UntypedPlutusCore.Transform.Inline ( untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs, dist/build/UntypedPlutusCore/Transform/Inline.o )

untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs:284:29: error:
    • Variable not in scope:
        asum
          :: [Maybe (Term name1 uni1 fun1 ann)]
             -> Maybe (Term name1 uni1 fun1 ann)
    • Perhaps you meant ‘sum’ (imported from Prelude)
    |
284 |         Constr _ _ ts    -> asum $ goTerm <$> ts
    |

too.

I don't think the Darwin -> Windows pipeline works, and I'm not sure why this is in CI.

angerman · 2023-05-31T09:17:14Z

Notable I don't see that failure on master (with bumped haskell.nix): #5366

angerman

I think we should revert this back to the original for now.

nix/cells/automation/ciJobs.nix

kwxm · 2023-05-31T22:04:22Z

@kwxm here's the haskell.nix bump PR: #5366

I've also tried to just bump haskell.nix on this PR, but that resulted in

       > untyped-plutus-core/src/UntypedPlutusCore/Transform/Inline.hs:284:29: error:
       >     • Variable not in scope:
       >         asum
       >           :: [Maybe (Term name1 uni1 fun1 ann)]
       >              -> Maybe (Term name1 uni1 fun1 ann)
       >     • Perhaps you meant ‘sum’ (imported from Prelude)
       >     |
       > 284 |         Constr _ _ ts    -> asum $ goTerm <$> ts
       >     |

after I pulled the latest changes confused

That's weird: asum should be imported from Control.Applicative here. I have no idea why that would have happened (and the file that's causing the error is completely untouched by this PR). Also, I don't get an error on Linux after pulling your changes and restarting nix develop.

angerman · 2023-06-01T06:07:54Z

@kwxm yes, I'm also a bit surprised by this. Even more so that I don't see this in master 😅

michaelpj · 2023-06-01T08:53:50Z

Looks like it succeeded but the status isn't getting reported for some reason.

angerman · 2023-06-01T14:09:41Z

I think this is good to merge. IDK why hydra complains. We could forcepush. I'll force push once.

old

kwxm · 2023-06-01T17:53:43Z

Slightly premature, since I was just about to push some more changes! I'll open a new branch for that.

kwxm added 30 commits December 13, 2022 00:09

Initial version of BLS pairing builtins

0020b29

WIP

7e0de43

WIP

3927e15

WIP: implementations of GT operations

4ade080

Tidy up instances

56589d9

More-or-less complete implementation for UPLC

a592101

Initial costing benchmarks for BLS builtins

26b01ec

Update benchmarks

1bb2753

Update R code

099ba8c

Forgot source files

bf30d3f

Wrong denotation for GT_mul

4956ed9

Partial updates to CreateBuiltinCostModel

b1b1b0d

Merge branch 'master' into kwxm/BLS12_381/prototype

2354352

Fix typos in function names

a81447e

Update memory models for BLS12_381 builtins

3cfe6db

Update memory models for BLS12_381 builtins

51bbb93

WIP

dd3b6d3

Rename BLS (de)serialise -> (un)compress

48a0f18

Reformat

aaa8ec9

inline-r workaround; corrections to cost model generation code

378264f

inline-r workaround

39d1804

Update cost model tests for BLS functions

236b2cb

Update benching results and cost model file for BLS

c977559

Update benching results and cost model file for BLS

0b0048e

Update comment

c17d0b8

Update comment

cd8c1c5

Add some extra stuff for the benefit of the QuickCheck shrinker

994b478

Add new builtins to plutus-tx

fd18459

Add a few Haskell BLS examples

1c219e7

WIP

ba59233

michaelpj and others added 4 commits May 26, 2023 17:08

WIP updating things

19563fa

bump iohk-nix

651f525

Improve hash collision tests

aa41b37

Merge branch 'kwxm/BLS12_381/prototype' of github.com:input-output-hk…

310cfe0

…/plutus into kwxm/BLS12_381/prototype

kwxm added 4 commits May 26, 2023 19:55

Update benching.csv to new format to make merge easier

57dc567

Merge branch 'master' into kwxm/BLS12_381/prototype

fcc4551

Resolve some remaining conflicts

9a07a5a

Update ciJobs.nix

e88aa44

bump haskell.nix

17ff8f3

angerman previously requested changes May 31, 2023

View reviewed changes

nix/cells/automation/ciJobs.nix Outdated Show resolved Hide resolved

nix/cells/automation/ciJobs.nix Outdated Show resolved Hide resolved

angerman added 2 commits May 31, 2023 17:20

Update nix/cells/automation/ciJobs.nix

df8f19b

Update nix/cells/automation/ciJobs.nix

e67e029

Update plutus-metatheory/src/Algorithmic/Erasure.lagda

29a78e3

Merge branch 'master' into kwxm/BLS12_381/prototype

646b7e6

angerman force-pushed the kwxm/BLS12_381/prototype branch from 643900a to 646b7e6 Compare June 1, 2023 14:11

catch-21 mentioned this pull request Jun 1, 2023

Add BLS tests IntersectMBO/antaeus#15

Merged

michaelpj merged commit 996c7fc into master Jun 1, 2023

michaelpj deleted the kwxm/BLS12_381/prototype branch June 1, 2023 15:31

kwxm mentioned this pull request Jun 1, 2023

Kwxm/bls12 381/minor fixes #5367

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kwxm/bls12-381/prototype (PLT-192, PLT-1557, PLT-1554, etc). #5231

Kwxm/bls12-381/prototype (PLT-192, PLT-1557, PLT-1554, etc). #5231

kwxm commented Mar 24, 2023 •

edited

Loading

kwxm commented May 26, 2023

kwxm commented May 30, 2023

angerman commented May 31, 2023

angerman commented May 31, 2023

angerman commented May 31, 2023

angerman left a comment

kwxm commented May 31, 2023 •

edited

Loading

angerman commented Jun 1, 2023

michaelpj commented Jun 1, 2023

angerman commented Jun 1, 2023

kwxm commented Jun 1, 2023

Kwxm/bls12-381/prototype (PLT-192, PLT-1557, PLT-1554, etc). #5231

Kwxm/bls12-381/prototype (PLT-192, PLT-1557, PLT-1554, etc). #5231

Conversation

kwxm commented Mar 24, 2023 • edited Loading

Introduction

Details

Background

Abelian groups

Finite fields

Elliptic curves

Pairings

Back to Plutus

Serialisation and compression

Concrete syntax for G₁ and G₂ elements.

Hashing

Pairing operations and bls12_381_mlresult

flat serialisation

Costing

kwxm commented May 26, 2023

kwxm commented May 30, 2023

angerman commented May 31, 2023

angerman commented May 31, 2023

angerman commented May 31, 2023

angerman left a comment

Choose a reason for hiding this comment

kwxm commented May 31, 2023 • edited Loading

angerman commented Jun 1, 2023

michaelpj commented Jun 1, 2023

angerman commented Jun 1, 2023

kwxm commented Jun 1, 2023

kwxm commented Mar 24, 2023 •

edited

Loading

Concrete syntax for `G₁` and `G₂` elements.

Pairing operations and `bls12_381_mlresult`

`flat` serialisation

kwxm commented May 31, 2023 •

edited

Loading