Preamble

SIP Number: 008

Title: Clarity Parsing and Analysis Cost Assessment

Author: Aaron Blankstein aaron@blockstack.com

Consideration: Technical

Type: Consensus

Status: Ratified

Created: 5 March 2020

License: BSD 2-Clause

Sign-off: Jude Nelson jude@stacks.org, Technical Steering Committee Chair

Discussions-To: https://github.com/stacksgov/sips

Abstract

This document describes the measured costs and asymptotic costs assessed for parsing Clarity code into an abstract syntax tree (AST) and the static analysis of that Clarity code (type-checking and read-only enforcement). This will not specify the constants associated with those asymptotic cost functions. Those constants will necessarily be measured via benchmark harnesses and regression analyses.

Introduction

The cost of analyzing Clarity code is measured using the same 5 categories described in SIP-006 for the measurement of execution costs:

Runtime cost: captures the number of cycles that a single processor would require to process the Clarity block. This is a unitless metric, so it will not correspond directly to cycles, but rather is meant to provide a basis for comparison between different Clarity code blocks.
Data write count: captures the number of independent writes performed on the underlying data store (see SIP-004).
Data read count: captures the number of independent reads performed on the underlying data store.
Data write length: the number of bytes written to the underlying data store.
Data read length: the number of bytes read from the underlying data store.

Importantly, these costs are used to set a block limit for each block. When it comes to selecting transactions for inclusion in a block, miners are free to make their own choices based on transaction fees, however, blocks may not exceed the block limit. If they do so, the block is considered invalid by the network --- none of the block's transactions will be materialized and the leader forfeits all rewards from the block.

Costs for static analysis are assessed during the type check pass. The read-only and trait-checking passes perform work which is strictly less than the work performed during type checking, and therefore, the cost assessment can safely fold any costs that would be incurred during those passes into the type checking pass.

Specification

Common Analysis Metrics and Costs

AST Parsing

The Clarity parser has a runtime that is linear with respect to the Clarity program length.

a*X+b

where a and b are constants, and

X := the program length in bytes

Dependency cycle detection

Clarity performs cycle detection for intra-contract dependencies (e.g., functions that depend on one another). This detection is linear in the number of dependency edges in the smart contract:

a*X+b

where a and b are constants, and X := the total number of dependency edges in the smart contract

Dependency edges are created anytime a top-level definition refers to another top-level definition.

Type signature size

Types in Clarity may be described using type signatures. For example, (tuple (a int) (b int)) describes a tuple with two keys a and b of type int. These type descriptions are used by the Clarity analysis passes to check the type correctness of Clarity code. Clarity type signatures have varying size, e.g., the signature int is smaller than the signature for a list of integers.

The signature size of a Clarity type is defined as follows:

type_signature_size(x) :=
  if x = 
     int      => 1
    uint      => 1
    bool      => 1
    principal => 1
    buffer    => 2
    optional  => 1 + type_signature_size(entry_type)
    response  => 1 + type_signature_size(ok_type) + type_signature_size(err_type)
    list      => 2 + type_signature_size(entry_type)
    tuple     => 1 + 2*(count(entries)) 
                   + sum(type_signature_size for each entry)
                   + sum(len(key_name) for each entry)

Type annotation

Each node in a Clarity contract's AST is annotated with the type value for that node during the type checking analysis pass.

The runtime cost of type annotation is:

a + b*X

where a and b are constants, and X is the type signature size of the type being annotated.

Variable lookup

Looking up variables during static analysis incurs a non-constant cost -- the stack depth and the length of the variable name affect this cost. However, variable names in Clarity have bounded length -- 128 characters. Therefore, the cost assessed for variable lookups may safely be constant with respect to name length.

The stack depth affects the lookup cost because the variable must be checked for in each context on the stack.

Cost Function:

a*X+b*Y+c

where a, b, and c are constants, X := stack depth Y := the type size of the looked up variable

Function Lookup

Looking up a function incurs a constant cost with respect to name length (for the same reason as variable lookup). However, because functions may only be defined in the top-level contract context, stack depth does not affect function lookup.

Cost Function:

a*X + b

where a and b are constants, X := the sum of the type sizes for the function signature (each argument's type size, as well as the function's return type)

Name Binding

The cost of binding a name in Clarity -- in either a local or the contract context is constant with respect to the length of the name, but linear in the size of the type signature.

binding_cost = a + b*X

where a and b are constants, and X := the size of the bound type signature

Type check cost

The cost of a static type check is linear in the size of the type signature:

type_check_cost(expected, actual) :=
  a + b*X

where a and b are constants, and

X := max(type_signature_size(expected), type_signature_size(actual))

Function Application

Static analysis of a function application in Clarity requires type checking the function's expected arguments against the supplied types.

The cost of applying a function is:

a + sum(type_check_cost(expected, actual) for each argument)

where a is a constant.

This is also the entire cost of type analysis for most function calls (e.g., intra-contract function calls, most simple native functions).

Iterating the AST

Static analysis iterates over the entire program's AST in the type checker, the trait checker, and in the read-only checker. This cost is assessed as a constant cost for each node visited in the AST during the type checking pass.

Special Function Costs

Some functions require additional work from the static analysis system.

Functions on sequences (e.g., map, filter, fold)

Functions on sequences need to perform an additional check that the supplied type is a list or buffer before performing the normal argument type checking. This cost is assessed as:

where a is a constant.

Functions on options/responses

Similarly to the functions on sequences, option/response functions must perform a simple check to see if the supplied input is an option or response before performing additional argument type checking. This cost is assessed as:

Data functions (ft balance checks, nft lookups, map-get?, ...)

Static checks on intra-contract data functions do not require database lookups (unlike the runtime costs of these functions). Rather, these functions incur normal type lookup (i.e., fetching the type of an NFT, data map, or data var) and type checking costs.

get

Checking a tuple get requires accessing the tuple's signature for the specific field. This has runtime cost:

a*log(N) + b

where a and b are constants, and

N := the number of fields in the tuple type

tuple

Constructing a tuple requires building the tuple's BTree for accessing fields. This has runtime cost:

a*N*log(N) + b

where a and b are constants, and

N := the number of fields in the tuple type

use-trait

Importing a trait imposes two kinds of costs on the analysis. First, the import requires a database read. Second, the imported trait is included in the static analysis output -- this increases the total storage usage and write length of the static analysis.

The costs are defined as:

read_count = 1
write_count = 0
runtime = a*X+b
write_length = c*X+d
read_length = c*X+d

where a, b, c, and d are constants, and

X := the total type size of the trait (i.e., the sum of the type sizes of each function signature).

contract-call?

Checking a contract call requires a database lookup to inspect the function signature of a prior smart contract.

The costs are defined as:

read_count = 1
read_length = a*X+b
runtime = c*X+d

where a, b, c, and d are constants, and

X := the total type size of the function signature

let

Let bindings require the static analysis system to iterate over each let binding and ensure that they are syntactically correct.

This imposes a runtime cost:

a*X + b

where a and b are constants, and

X := the number of entries in the let binding.

Related Work

This section will be expanded upon after this SIP is ratified.

Backwards Compatibility

Not applicable.

Activation

At least 20 miners must register a name in the .miner namespace in Stacks 1.0. Once the 20th miner has registered, the state of Stacks 1.0 will be snapshotted. 300 Bitcoin blocks later (Bitcoin block 666050), the Stacks 2.0 blockchain will launch. Stacks 2.0 implements this SIP.

Reference Implementations

Implemented in Rust. See https://github.com/blockstack/stacks-blockchain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sip-008-analysis-cost-assessment.md

sip-008-analysis-cost-assessment.md

Preamble

Abstract

Introduction

Specification

Common Analysis Metrics and Costs

AST Parsing

Dependency cycle detection

Type signature size

Type annotation

Variable lookup

Function Lookup

Name Binding

Type check cost

Function Application

Iterating the AST

Special Function Costs

Functions on sequences (e.g., map, filter, fold)

Functions on options/responses

Data functions (ft balance checks, nft lookups, map-get?, ...)

get

tuple

use-trait

contract-call?

let

Related Work

Backwards Compatibility

Activation

Reference Implementations

Files

sip-008-analysis-cost-assessment.md

Latest commit

History

sip-008-analysis-cost-assessment.md

File metadata and controls

Preamble

Abstract

Introduction

Specification

Common Analysis Metrics and Costs

AST Parsing

Dependency cycle detection

Type signature size

Type annotation

Variable lookup

Function Lookup

Name Binding

Type check cost

Function Application

Iterating the AST

Special Function Costs

Functions on sequences (e.g., map, filter, fold)

Functions on options/responses

Data functions (ft balance checks, nft lookups, map-get?, ...)

get

tuple

use-trait

contract-call?

let

Related Work

Backwards Compatibility

Activation

Reference Implementations