diff --git a/CIP-XXXX/README.md b/CIP-XXXX/README.md new file mode 100644 index 000000000..898efa2d8 --- /dev/null +++ b/CIP-XXXX/README.md @@ -0,0 +1,270 @@ +--- +CIP: ? +Title: Merkelised Plutus Scripts +Authors: Las Safin +Status: Draft +Type: Standards +Created: 2022-11-29 +License: +--- + +## Abstract + +Currently, the hash of a script is simply the hash of its [serialisation]( +https://github.com/input-output-hk/plutus/blob/a645d1ee0dd5efcd7a7da24678461e07396ad26e/plutus-ledger-api/src/PlutusLedgerApi/Common/SerialisedScript.hs#L88). +This CIP proposes changing this such that the hash of a script (term) +is a function of its immediate children's hashes, forming a Merkle Tree from the AST. +This allows one to shallowly verify a script's hash, and is useful on Cardano, +because it allows scripts to **check that a script hash is an instantiation of a parameterised script**. + +In addition, a `blake2b_224` built-in function must be added. + +This is inspired by [BIP-144](https://github.com/bitcoin/bips/blob/master/bip-0114.mediawiki), +but the motivations are very different. + +## Motivation + +Given some core logic expressible as a script, it is common to have parameters in +the form of constants, e.g. fees, references to other scripts, magical numbers. + +These parameters can either be put in a datum somewhere, or can be put into the +script itself, either by inlining them, or applying the unapplied script to the constants. + +On-chain it is currently hard to check that one script is an applied form +of another script. In cases where that is necessary, datums are instead used. + +By Merkelising the hashing, we make this possible, +which unlocks checking that a script is an application of another script to some parameter. + +Example reasons to apply the parameters to the script: +- Staking validators currently don't support datums, and all staking validators + share a single rewards account. Allowing checking applied parameters + makes staking validators much more powerful. (More about this below) +- Constants can be included in reference script, leading to less CPU and memory used, + since they don't have to be parsed from the adjacent datum (somewhat cheap) + or the script context (very expensive). +- A script address + datum can't fit in an address, + if you want that you also need this (or need to change what an address is). + +## Specification + +The hash of a script will be derived directly from the AST, rather than its serialisation. +Currently, its formed by hashing the serialisation prefixed with a byte that represents its version, e.g. 0x02 for Plutus V2. + +The hash of a script becomes the hash of the prefix version annotation prepended to the hash of the term. + +[`Term`](https://github.com/input-output-hk/plutus/blob/a645d1ee0dd5efcd7a7da24678461e07396ad26e/plutus-core/untyped-plutus-core/src/UntypedPlutusCore/Core/Type.hs#L69) +currently has 8 constructors. On-chain, annotations are always the unit type, +and are hence ignored for this algorithm. Each case/constructor is generally handled by +hashing the concatenation of a prefix (single byte corresponding to the +constructor index) along with the hashes of the arguments passed to the constructor. + +Similar code can be found [in Plutarch](https://github.com/Plutonomicon/plutarch-plutus/blob/95e40b42a1190191d0a07e3e4e938b72e6f75268/Plutarch/Internal.hs#L100) (for a slightly different AST). + +To avoid giving a single script two hashes, +this system must be used (exclusively) since at least a version after Plutus V2. + +The algorithm for checking a script hash against a supplied script (of a new version) +in the ledger will change slightly: rather than hashing the supplied serialised +script directly, the decoding of the serialised script must be hashed. +(NB: the hashing and decoding can be fused to avoid intermediary structures.) + +To allow computing the hash in scripts, we must support `blake2b-224` in Plutus scripts +as it's what is currently used. This algorithm used might change in the future, but that is +not relevant for this CIP. + +### Hashing `Error` + +Since there are no children, the hash of the `Error` term is the +hash of the prefix byte for the `Error` constructor. +You could theoretically choose any random number as the hash, +but it has to be proven to be random, hence hashing the prefix byte +is the best option. + +In pseudocode: `hash prefix` + +### Hashing `Builtin`, `Var` + +The hash of a `Builtin` is the hash of the prefix prepended to the base-256 encoded +(i.e. serialised to bytestring) index of the built-in function. +Because there are less than 256 built-ins, this is currently the same +as hashing the prefix byte prepended to the byte containing the index of +the built-in. + +`Var` is handled the exact same way (with a different prefix), +but it's in this case feasible for the index to be more than 255. + +In pseudocode: `hash $ prefix <> serialiseBase256 index` + +### Hashing `Apply`, `Force`, `Delay` + +These are hashed by hashing the result of prepending the prefix +byte to the concatenation of the hashes of the children. + +In pseudocode: `hash $ foldl' (<>) prefix (hash <$> children)` + +### Hashing `LamAbs` + +This works the exact same way as above, notably, the _name_ is excluded +as it's a constant in the de-Bruijn encoding. + +In pseudocode: `hash $ prefix <> hash body` + +### Hashing `Constant` + +The universe of types used on-chain is always `DefaultUni`. +Each possible data type is handled differently, with each having +a different prefix. The total number of prefixes does not exceed +255. If it did, the prefix would have to be increased to two bytes. + +In addition: +Negative integers and non-negative integers have separate prefixes. +False and True also have separate prefixes. + +#### Hashing non-negative integers + +The serialisation according to [CIP-58](https://github.com/cardano-foundation/CIPs/blob/a1b9ff0190ad9f3e51ae23e85c7a8f29583278f0/CIP-%3F/README.md#representation-of-builtininteger-as-builtinbytestring-and-conversions), +prefixed with the two-byte prefix, is hashed. + +In pseudocode: `hash $ prefix <> prefix' <> serialiseCIP58 n` + +#### Hashing negative integers + +The same algorithm as above is used, but the number hashed is `1 - n`. + +In pseudocode: `hash $ prefix <> prefix' <> serialiseCIP58 (1 - n)` + +#### Hashing bytestrings + +The bytestring is hashed as-is. +We use the blake2b-256 hash here, such that we can usefully check that +the script refers to a bytestring that we know only the hash of. + +In pseudocode: `hash $ prefix <> blake2b_256 bs` + +#### Hashing strings + +The flat-encoding is hashed. + +In pseudocode: `hash $ prefix <> flat x` + +#### Hashing lists, pairs + +Lists and pairs are hashed like a Merkle tree, +much the same way that terms are. +The children have a known type, and are hashed according to how that +type should be hashed, i.e. with the correct algorithm and prefix. + +In pseudocode: `hash $ foldl' (<>) prefix (hash <$> children)` + +#### Hashing `()`, `False`, `True` + +Each has its own separate prefix, like `Error`, hence: + +In pseudocode: `hash prefix` + +#### Hashing `Data` + +The `CBOR` encoding is used, notably, it must be compatible with the `serialiseData` +built-in to be useful on-chain. +We use the blake2b-256 hash here, such that we can usefully check that +the script refers to a datum that we know only the hash of. +If the hashing algorithm for data changes, we must also change it here. + +In pseudocode: `hash $ prefix <> blake2b_256 (serialiseData d)` + +## Rationale + +Given this minor change, we can now check that one script is the application of another script. +Concretely, given hash `script`, hash `original`, parameter `d` (as data), +intermediate hashes `h0`, `h1`, hashing prefixes `ver_prefix`, `app_prefix`, `const_prefix`, we check: +``` +script == blake2b_224 $ ver_prefix <> h0 +h0 == blake2b_224 $ app_prefix <> original <> h1 +h1 == blake2b_224 $ const_data_prefix <> blake2b_256 (serialiseData d) +``` + +We essentially open the Merkle tree commitment partially and check that the supplied path is correct. + +### Relation with CIP-58 + +This CIP does not _depend_ on CIP-58, but to hash integers on-chain +the way it's done here, CIP-58's integer-to-bytestring serialisation built-in +must be available in Plutus. + +### Relation with BIP-144 + +BIP144 uses this trick to avoid submitting the parts of the script that aren't used. +Given that reference scripts are common in Haskell, this isn't a big win for efficiency, +but it might be worth implementing for the sake of scripts used only once. +This CIP however doesn't require that that be implemented. + +The argument for privacy doesn't apply, private smart contracts can be achieved through +the use of non-interactive zero-knowledge probabilistic proofs. + +### Reference scripts + +Currently, different instances of the same script will need their own reference inputs +since their hashes are different. It seems feasible to allow sharing of a single reference script, +given the parameters and language version as witnesses, but given the complexity +involved, it is not specified in this CIP. + +### Staking + +This makes staking validators much more powerful, since a single protocol can +now manage many rewards accounts (by instantiating the script with a numeric identifier). +However, it is arguably not the optimal solution due to the reference +script problem described above. Even if the reference script problem +is solved as described above, it seems logical to allow supplying a datum +to a staking validator, or somehow combining the payment address and staking address for scripts, +and using the same datum for both, while somehow solving the separate accounts problem. + +Given the heavy complexity of fixing staking validators, Merkelising script hashing seems much more feasible. + +### Alternatives + +#### Parameterised Reference Scripts + +See https://github.com/cardano-foundation/CIPs/pull/354. + +Seemingly, Merkelisation is a less invasive and possibly cleaner change. + +#### Changing how constants are hashed + +The hashing of constants might not have a clear best solution, but most notably, +it is not clear how much/less to Merkelise the hashing. +E.g., the hashing of data itself could be Merkelised. This is not done in this CIP. +The hashing of a `Data` constant could also prepend the prefix directly to the serialisation, +rather than to the hash of the `Data`. It is not clear what is best. + +##### Hashing strings, lists, pairs differently + +Strings are not very useful in Plutus. +Hence, the hashing algorithm for them isn't optimised such that +they can be easily verified. + +Strings have essentially no purpose on-chain, since they're only used +for tracing, which should not be used in production. + +In the context of checking applied parameters, it is likely that only +`Data`, `Integer`, `Bool`, `ByteString`, will be used as parameters, +since they cover all useful behaviour in an efficient way. +If you want to parameterise your script by a pair of integers, +it is likely best to unwrap that into two separate integer parameters +for the sake of efficiency of _running_ the script, which is likely +to be more common that checking the parameters. + +Built-in lists and pairs are not commonly used as parameters, but it's plausible +that they might still be the most efficient method in some scenarios. +Hence, they have been included. +They use Merkle-tree hashing since that's the simplest and most useful in this case. + +## Path to Active + +### Implementation plan + +Las Safin will implement this if IOG don't have time. + +## Copyright + +This CIP is licensed under CC-BY-4.0.