AVM: Adding bmodexp #6140

mangoplane · 2024-09-21T10:47:40Z

Summary

Adds the new opcode bmodexp as described in issue #6139 to support modular exponentiation involving byte strings of up to 4096 bytes. Closes #6139

Test Plan

Relevant tests added to assembler_test.go, evalStateful_test.go, & eval_test.go
Opcode is tested with a range of test vectors with function TestBytesModExp, covering panic cases, edge cases, acceptance cases and failure cases. Test vectors were generated manually or with Python.

CLAassistant · 2024-09-21T10:47:45Z

All committers have signed the CLA.

codecov · 2024-09-21T11:13:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.25%. Comparing base (2a02530) to head (7ea8644).
Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6140      +/-   ##
==========================================
- Coverage   56.27%   56.25%   -0.02%     
==========================================
  Files         494      494              
  Lines       69943    69957      +14     
==========================================
- Hits        39358    39354       -4     
- Misses      27910    27925      +15     
- Partials     2675     2678       +3

Flag	Coverage Δ
	`56.25% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jannotti

This is looking quite good so far.

data/transactions/logic/eval.go

jannotti · 2024-09-21T11:38:11Z

data/transactions/logic/eval.go

+	prev := last - 1          // y
+	pprev := last - 2         // x
+
+	if len(cx.Stack[last].Bytes) > maxStringSize || len(cx.Stack[prev].Bytes) > maxStringSize || len(cx.Stack[pprev].Bytes) > maxStringSize {


I'm doing this from phone, but maxStringSize is the AVM max of 4096? That's unusual for the bmath functions, they normally have a maximum of 128 bytes. I suppose you want more because RSA uses really large keys?

At any rate, it's impossible to supply inputs that are larger than this, so there's no need to check in the opcode.

That makes sense. Yes, the size is intended to support RSA which has really large keys. Is it okay if we allow the opcode to support this size?

It's ok with me, assuming we can get the cost function to properly account for long inputs. It seems like bmodexp wouldn't be very interesting if it can't handle common RSA key sizes.

I'd like to support bigger inputs on the other b opcodes too, if we can adjust cost functions appropriately. They were first done before we could have the cost depend on size. b+, for example, would almost certainly be easy to adjust, b* might be more complex than simple length dependence. I'm somewhat worried that bmodexp is going to be tricky. Anyway, that should be a separate PR someday. (Do your RSA implementations require operating on RSA keys with any other operations?)

If it ends up being fast enough, we could pick a cost that accounts for the worst case, which I suppose would three very long inputs.

Or perhaps only scales based on one of the parameters, but accounts for the worst case with the others. (I think bmodexp should be linear with respect to the length of the exponent, since it basically performs one operation per bit.)

I just read your discussion on cost from the issue more closely. The dependence on the square of the lengths is unfortunate because it implies a custom cost function. Currently, all costs are "data directed" which is nice because it means we can create specs automatically - we can generate text that describes the cost from the constants provided in the opcode table.

I suppose we can add a way to provide both a Go function and a textual description directly in the .go source code. That is somewhat more fragile from the standpoint of modifications to the way we present the spec, but it doesn't see so bad. It's probably also necessary if we ever want to support larger inputs to b* which I suspect is where this really coming from.

If it ends up being fast enough, we could pick a cost that accounts for the worst case, which I suppose would three very long inputs.

Or perhaps only scales based on one of the parameters, but accounts for the worst case with the others. (I think bmodexp should be linear with respect to the length of the exponent, since it basically performs one operation per bit.)

This is very tempting, and would mean minimal complexity. I anticipate bmodexp rarely being used, except for applications where the inputs are large such as RSA.

Or perhaps only scales based on one of the parameters, but accounts for the worst case with the others. (I think bmodexp should be linear with respect to the length of the exponent, since it basically performs one operation per bit.)

This is a good idea that might simplify the calculation to allow an existing linear cost model to be used.

Do your RSA implementations require operating on RSA keys with any other operations?

The only operators for RSA besides modexp are the less than and equal operators. These are efficiently implemented with a 512 bit digit partitioning algorithm available in Puya-Bignumber.

I will explore the suggestions to try to linearise the cost model, and figure out the max cost to see if it's cheap enough that we can use a constant. In my opinion, I think the complexity that would be introduced as highlighted with a non-linear cost model isn't worth it, if we can figure out a bounding linear model that's good enough. I'll present the results of the linear model to the discussion thread, and go from there.

data/transactions/logic/evalStateful_test.go

data/transactions/logic/eval_test.go

data/transactions/logic/opcodes.go

data/transactions/logic/doc.go

mangoplane · 2024-09-22T05:35:27Z

I have addressed your feedback with a recent update, per my understanding. Let me know if there's anything that I may have misinterpreted.

giuliop · 2024-09-24T08:59:54Z

bmodexp is useful also in different scenarios with smaller inputs so we should not penalize those cost-wise, for instance all ZKP protocols based on elliptic curves use it and they try to operate on the smallest field possible while preserving security for efficiency.

As a concrete example, smart contract verifiers for zk-proofs based on the plonk protocol generated by AlgoPlonk call a teal implementation of it 5 times, using 32-byte inputs for both curve BN254 and BLS12-381.

Considering that a BN254 verifier consumes ~145,000 opcode budget and a BLS12-381 verifier ~185,000, bmodexp would really help bring down that cost.

giuliop · 2024-09-24T09:12:30Z

I'm doing this from phone, but maxStringSize is the AVM max of 4096? That's unusual for the bmath functions, they normally have a maximum of 128 bytes. I suppose you want more because RSA uses really large keys?

Isn't the maximum 64 bytes?

If we can have a plausible linear model for the smaller inputs currently supported by the b-operations (64 bytes?) which breaks for very large inputs, a solution we might consider for consistency is to offer bmodexp that operates on bigint like the other b-operations and add a separate fixed-cost opcode that operates on 4096 byte strings, e.g. modexpstring

mangoplane · 2024-09-24T22:30:48Z

I'm doing this from phone, but maxStringSize is the AVM max of 4096? That's unusual for the bmath functions, they normally have a maximum of 128 bytes. I suppose you want more because RSA uses really large keys?

Isn't the maximum 64 bytes?

If we can have a plausible linear model for the smaller inputs currently supported by the b-operations (64 bytes?) which breaks for very large inputs, a solution we might consider for consistency is to offer bmodexp that operates on bigint like the other b-operations and add a separate fixed-cost opcode that operates on 4096 byte strings, e.g. modexpstring

You make some good points. However, in my opinion, we should offer a single opcode to reduce complexity. With a sufficiently accurate cost model, such as the one proposed in this GitHub comment, the estimated cost will closely reflect the actual cost within a small margin of error. For example, using the log-polynomial model, the cost for 64 byte long input is 105, which is intuitively reasonable and relatively small. To simplify the cost in that range, we could use a piecewise function where the cost is 105 for all inputs up to 64 bytes in length, and for longer inputs, it follows the advanced cost model. The log-polynomial model also accurately describes the cost for much larger inputs.

Additionally, this seems to align with the long-term vision of allowing byte math opcodes, such as b+, to support up to 4096 bytes each, if I'm not mistaken based on the above discussion.

The opcode should work for inputs up to 4096 bytes, with several test cases exceeding the 64-byte limit. As a sanity check I think it's worth adding another test case to verify the maximum supported length of 4096 bytes.

jannotti · 2024-09-25T15:13:25Z

Just to close the loop, is the suggestion that this: x = exponent_length * max(base_length, modulus_length)**2 will work for the cost function (with appropriate scaling)?

I would be on board with that, and I'll just have to write an "escape hatch" to allow an arbitrary cost function to be written.

algorandskiy · 2024-09-25T17:30:54Z

Based on this eval some log-poly formula gives better approximation.

giuliop · 2024-09-25T17:51:49Z

Just to close the loop, is the suggestion that this: x = exponent_length * max(base_length, modulus_length)**2 will work for the cost function (with appropriate scaling)?

Looks reasonable to me, the # of iterations is exponent_length and the amount of work performed per iteration is in first approximation modulo(base * base) so proportional to max(base_length, modulus_length)**2

mangoplane · 2024-09-25T22:04:25Z

Just to close the loop, is the suggestion that this: x = exponent_length * max(base_length, modulus_length)**2 will work for the cost function (with appropriate scaling)?

I would be on board with that, and I'll just have to write an "escape hatch" to allow an arbitrary cost function to be written.

I tried out that model after reading EIPS-2565, but I found it highly inaccurate ($R^2$ of 0.5) compared to the log-poly formula ($R^2$ of 0.96)

x = c1*log(C) + c2*log(A)log(C) + c3*log(B)log(C) + c4*log(A)log(B)log(C)

where c1, c2, c3, and c4 are coefficients calculated by linear regression.

I think this is because the log transformation causes any exponents and multiplications to become coefficients and additions, respectively. The algorithm used for modexp is likely very advanced, making use of all the best approaches known.

It's possible that my data isn't accurate, although I don't see how that could be. Perhaps it's worth trying to reproduce my results. I can provide my benchmark code upon request.

giuliop · 2024-09-26T09:34:41Z

I benchmarked bmodexp on my own and exponent_length * max(base_length, modulus_length)**2 looks reasonable to me.

I benchmarked using the same byte length for all three inputs, base, mode, and exp and I get:

Benchmark	Iterations	Time (ns/op)	Extra (extra/op)
BenchmarkBModExp/modexp_32_bytes_inputs-12	78,057	15,019 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_64_bytes_inputs-12	18,218	66,383 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_128_bytes_inputs-12	3,776	322,610 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_256_bytes_inputs-12	556	2,239,315 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_512_bytes_inputs-12	63	18,215,421 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_1024_bytes_inputs-12	8	142,717,656 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_2048_bytes_inputs-12	1	1,147,616,375 ns/op	4.000 extra/op
BenchmarkBModExp/modexp_4096_bytes_inputs-12	1	9,566,156,542 ns/op	4.000 extra/op

If we divide the ns/op by inputs_length**3 we get:

Inputs Len	Cost	Inputs Len³	Cost / Inputs Len³
32	15,019	32,768	0.46
64	66,383	262,144	0.25
128	322,610	2,097,152	0.15
256	2,239,315	16,777,216	0.13
512	18,215,421	134,217,728	0.14
1024	142,717,656	1,073,741,824	0.13
2048	1,147,616,375	8,589,934,592	0.13
4096	9,566,156,542	68,719,476,736	0.14

Looks like the cost stabilizes after inputs length of 128 bytes.
I think we can make the cost proportional to exponent_length * max(base_length, modulus_length)**2 and perhaps add a constant factor to account for the higher relative cost of small inputs

This is the benchmark function I'm using:

func BenchmarkBModExp(b *testing.B) {
	for _, byteLen := range []int{32, 64, 128, 256, 512, 1024, 2048, 4096} {
		var base, exp, mod []byte
		for _, input := range []*[]byte{&base, &exp, &mod} {
			// create random inputs of the given length, we are using math/rand without seeding,
			// so it will generate the same pseudorandom numbers for each run
			*input = make([]byte, byteLen)
			for i := range *input {
				(*input)[i] = byte(rand.Intn(256))
			}
		}
		ops := fmt.Sprintf("byte 0x%x; byte 0x%x; byte 0x%x; bmodexp; pop", base, exp, mod)
		b.Run(fmt.Sprintf("modexp_%d_bytes_inputs", byteLen), func(b *testing.B) {
			benchmarkOperation(b, "", ops, "int 1")
		})
	}
}

algorandskiy · 2024-09-26T19:35:21Z

@mangoplane could you commit the benchmarks into crypto_test.go - that's where other cost evaluation benchmarks for crypto opcodes live - I'll try to repro/replay the notebook.

@giuliop how about re-running modexp_1024+ (like -count=64 I guess) to get a better avg value?

giuliop · 2024-09-27T10:07:09Z

@giuliop how about re-running modexp_1024+ (like -count=64 I guess) to get a better avg value?

sure, I rerun using -benchtime=64x to have 64 runs (except for 4096 bytes which would timeout, so I ran it 32 times) and the results are in line as before:

Benchmark	Trials	Time per op (ns/op)	Extra ops
BenchmarkBModExp1/modexp_1024_bytes_inputs-12	64	142,368,141	4.000
BenchmarkBModExp1/modexp_2048_bytes_inputs-12	64	1,139,738,240	4.000
BenchmarkBModExp1/modexp_4096_bytes_inputs-12	32	9,443,533,249	4.000

Looks to me like we can use exponent_length * max(base_length, modulus_length)**2 and not make it more complicated than that

mangoplane · 2024-09-27T12:10:13Z

Thanks @giuliop for the insightful benchmarks. I replicated your results when the base length is at least that of the modulus. For smaller bases, the cost is overestimated, explaining the low $R^2$ I had for my test data. Since most bmodexp applications (like RSA) involve base length ≥ modulus length, I agree that exponent_length * max(base_length, modulus_length)**2 is a sufficient approximation.

And @algorandskiy, I have provided my benchmark code to reproduce the results of the notebook. Note that it isn't using a seed, so each run will produce slightly different results, but the overall trend should be the same.

fix: resolve loop variable 'i' captured by func literal in closure

bf4dc8f

mangoplane changed the title ~~New opcode modexp~~ AVM: Adding bmodexp Sep 21, 2024

jannotti reviewed Sep 21, 2024

View reviewed changes

jannotti added the Enhancement label Sep 21, 2024

jannotti self-assigned this Sep 21, 2024

jannotti reviewed Sep 21, 2024

View reviewed changes

data/transactions/logic/doc.go Outdated Show resolved Hide resolved

AVM: Adding bmodexp

7b6b6c2

mangoplane force-pushed the new-opcode-modexp branch from d737fb1 to 7b6b6c2 Compare September 22, 2024 05:31

jannotti mentioned this pull request Sep 25, 2024

AVM: uint256 opcodes #6083

Draft

3 tasks

feat: add BenchmarkBytesModExp to benchmark bmodexp

7ea8644

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVM: Adding bmodexp #6140

AVM: Adding bmodexp #6140

mangoplane commented Sep 21, 2024

CLAassistant commented Sep 21, 2024 •

edited

Loading

codecov bot commented Sep 21, 2024 •

edited

Loading

jannotti left a comment

jannotti Sep 21, 2024

mangoplane Sep 21, 2024

jannotti Sep 21, 2024

jannotti Sep 21, 2024

jannotti Sep 21, 2024

mangoplane Sep 22, 2024

mangoplane commented Sep 22, 2024

giuliop commented Sep 24, 2024

giuliop commented Sep 24, 2024 •

edited

Loading

mangoplane commented Sep 24, 2024 •

edited

Loading

jannotti commented Sep 25, 2024

algorandskiy commented Sep 25, 2024

giuliop commented Sep 25, 2024 •

edited

Loading

mangoplane commented Sep 25, 2024 •

edited

Loading

giuliop commented Sep 26, 2024

algorandskiy commented Sep 26, 2024 •

edited

Loading

giuliop commented Sep 27, 2024

mangoplane commented Sep 27, 2024

AVM: Adding bmodexp #6140

Are you sure you want to change the base?

AVM: Adding bmodexp #6140

Conversation

mangoplane commented Sep 21, 2024

Summary

Test Plan

CLAassistant commented Sep 21, 2024 • edited Loading

codecov bot commented Sep 21, 2024 • edited Loading

Codecov Report

jannotti left a comment

Choose a reason for hiding this comment

jannotti Sep 21, 2024

Choose a reason for hiding this comment

mangoplane Sep 21, 2024

Choose a reason for hiding this comment

jannotti Sep 21, 2024

Choose a reason for hiding this comment

jannotti Sep 21, 2024

Choose a reason for hiding this comment

jannotti Sep 21, 2024

Choose a reason for hiding this comment

mangoplane Sep 22, 2024

Choose a reason for hiding this comment

mangoplane commented Sep 22, 2024

giuliop commented Sep 24, 2024

giuliop commented Sep 24, 2024 • edited Loading

mangoplane commented Sep 24, 2024 • edited Loading

jannotti commented Sep 25, 2024

algorandskiy commented Sep 25, 2024

giuliop commented Sep 25, 2024 • edited Loading

mangoplane commented Sep 25, 2024 • edited Loading

giuliop commented Sep 26, 2024

algorandskiy commented Sep 26, 2024 • edited Loading

giuliop commented Sep 27, 2024

mangoplane commented Sep 27, 2024

CLAassistant commented Sep 21, 2024 •

edited

Loading

codecov bot commented Sep 21, 2024 •

edited

Loading

giuliop commented Sep 24, 2024 •

edited

Loading

mangoplane commented Sep 24, 2024 •

edited

Loading

giuliop commented Sep 25, 2024 •

edited

Loading

mangoplane commented Sep 25, 2024 •

edited

Loading

algorandskiy commented Sep 26, 2024 •

edited

Loading