-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Keccak sponge #1291
Implement Keccak sponge #1291
Conversation
@@ -1,106 +1,136 @@ | |||
import { Field } from './field.js'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we have a full end to end functional Keccak, could you also add that (and some of the other variants, where you see fit) to the verification key regression tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I left a few comments to ask for some clarifications regarding the exact constraints that are being added behind the scenes, but other than this it all looks good to me. Will be glad to approve as soon as we solve those concerns.
src/lib/keccak.ts
Outdated
// AUXILARY FUNCTIONS | ||
|
||
// Auxiliary function to check composition of 8 bytes into a 64-bit word | ||
function checkBytesToWord(word: Field, wordBytes: Field[]): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must assume wordBytes
is expected in little endian order then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is to behave as in getKeccakStateOfBytes
for each of the words?
|
||
// Return a keccak state where all lanes are equal to 0 | ||
const getKeccakStateZeros = (): Field[][] => | ||
Array.from(Array(KECCAK_DIM), (_) => Array(KECCAK_DIM).fill(Field.from(0))); | ||
|
||
// Converts a list of bytes to a matrix of Field elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one does not create constraints for the byte length? Asking because keccakStateToBytes
does. I guess this is because getKeccakStateOfBytes
is only used at the beginning of Keccak where inputs are already constraints, but keccakStateToBytes
is called at the end where you might want to check fresh bytes of the hash output?
return state; | ||
} | ||
|
||
// Converts a state of Fields to a list of bytes as Fields and creates constraints for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay after reading I can see that the constraints here being referred to correspond to the decomposition of each word, but not the individual bytes composing each word. Wouldn't this need further calls to rangeCheck8 to constrain this is indeed the case? Or is this handled somewhere else in the code?
} | ||
|
||
// TODO(jackryanservia): Use lookup argument once issue is resolved | ||
// Checks in the circuit that a list of Fields are at most 8 bits each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay so the code indeed still needs some calls to checkBytes here and there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly? I've looked carefully, and I think all words that are split up by keccakStateToBytes
and where the bytes are actually used, are constrained in the final checkBytes
call on the output in the hash()
function
But I fully agree that not having creation of bytes and checks on them co-located is confusing and error-prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the final checkBytes call on the output in the hash() function
yeah, I saw that when I reached to the end of the review
src/lib/keccak.ts
Outdated
|
||
// Check each Field input is 8 bits at most if it was not done before at creation time | ||
if (byteChecks) { | ||
checkBytes(messageFormatted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm against having the checkBytes
option. We never put constraints like these on inputs to a gadget. Instead, message bytes should be range-checked by whatever function creates them.
the original PR by @MartinMinkov added a UInt8
class and made that the input type: #999
That was a nice solution, because the UInt8
s would already be constrained when passed to a circuit or witnessed. so, there would be no (easy) way to even create a UInt8
that isn't range-checked (or constant, and thus, verifiably within a range).
src/lib/keccak.unit-test.ts
Outdated
)( | ||
(x) => { | ||
let thing = x.map(Number); | ||
let result = sha3_256(new Uint8Array(thing)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh so this is using another implementation of the Keccaks to compare against what the gadget provides? Nice
I wonder if it could make sense to create tests using some of the test vectors proposed by the Keccak Team. Maybe not to be run inside CI all the time, but at least having run them once before shipping it for production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got about half of the way through, will continue review tomorrow morning
return byte; | ||
}) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally we constrain values where they are created. so I would've expected a rangeCheck8
on the contents of bytestring
here
src/lib/keccak.ts
Outdated
for (let z = 0; z < BYTES_PER_WORD; z++) { | ||
// Field element containing value 2^(8*z) | ||
const shift = Field.from(2n ** BigInt(8 * z)); | ||
state[x][y] = state[x][y].add(shift.mul(wordBytes[z])); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it annoying how this duplicates the checkBytesToWord
logic, and even with a different programming style (loop vs reduce)
Proposal: change checkBytesToWord(word: Field, wordBytes: Field[])
to bytesToWord(wordBytes: Field[]): Field
and make it return the composed value. use it here:
state[x][y] = bytesToWord(wordBytes);
and add the extra assertion inline wherever you used checkBytesToWord
before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not done reviewing but sending it now because I found a vulnerability.
Besides that, I wanted to call out: This keccak implementation handles a fixed-length input. Like, for example, the padding and number of sponge absorb calls is hard-coded based on the message length.
I assume many applications will want to hash variable-length messages. This needs non-trivial amount of extra work, so we might want to schedule that as a follow-up task. (cc @nicc)
src/lib/keccak.ts
Outdated
// Create an array containing the 8 bytes starting on idx that correspond to the word in [x,y] | ||
const word_bytes = bytestring.slice(idx, idx + BYTES_PER_WORD); | ||
// Assert correct decomposition of bytes from state | ||
checkBytesToWord(state[x][y], word_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkBytesToWord(state[x][y], word_bytes); | |
state[x][y].assertEquals(bytesToWord(word_bytes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mitschabaude. Agreed that we should follow up with this. Have created a new issue here.
src/lib/keccak.ts
Outdated
inpEndian: 'Big' | 'Little' = 'Big', | ||
outEndian: 'Big' | 'Little' = 'Big', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike the use of additional config options to support something as simple as reversing the input/output bytes. if there is a default for the endianness that is typically used with keccak, I would prefer if you just use that and remove the parameter (and document the endianness used at the interface). users who need the nonstandard endianness can just reverse the bytes themselves.
btw, little endian is good, big endian is bad. but if keccak usually takes big endian then we should go with it.
src/lib/keccak.ts
Outdated
|
||
// Check each Field input is 8 bits at most if it was not done before at creation time | ||
if (byteChecks) { | ||
checkBytes(messageFormatted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm against having the checkBytes
option. We never put constraints like these on inputs to a gadget. Instead, message bytes should be range-checked by whatever function creates them.
the original PR by @MartinMinkov added a UInt8
class and made that the input type: #999
That was a nice solution, because the UInt8
s would already be constrained when passed to a circuit or witnessed. so, there would be no (easy) way to even create a UInt8
that isn't range-checked (or constant, and thus, verifiably within a range).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm done with the review, and this last round is mostly about efficiency.
There's a higher-level design question which also affects efficiency: Do all use cases need to pass the inputs as bytes? Because, if someone wanted to pass in data which already comes in 64-bit chunks, that would obviously be much more efficient, we could save all the bytes-to-words conversion and externally range checking the input bytes.
The same observation applies to the output, and there's definitely a use case which is more efficient when returning the output as 64-bit chunks: ECDSA.
For ECDSA, the output of this needs to become a 256-bit field element which is then represented as 3 88-bit limbs. So, if we return 4 64-bit words directly, we can convert them into limbs as follows:
input: words w0, w1, w2, w3
output: limbs l0, l1, l2
l0 = w0 | slice(w1, 0, 24)
l1 = slice(w1, 24, 64) | slice(w2, 0, 48)
l2 = slice(w2, 48, 64) | w3
so we can even use two output words directly without cutting them apart, and need to cut the middle two words into larger pieces which is much cheaper than into bytes.
therefore, my ask is to definitely make some version of this which returns full 64-bit words. for consistency, then, taking 64-bit words as input also makes sense.
maybe the following would be a useful collection of interfaces:
- a low-level Keccak interface which takes and returns 64-bit fields
- a high-level ECDSA interface which takes a byte string, does both Keccak and the EC part, and handles the word-to-limb conversion internally
- helper functions which split an array of 64-bit fields into 8-bit fields and vice versa
this is just a suggestion - pick something you like with the constraint that it doesn't force us to go to intermediate bytes in ECDSA
); | ||
const blockState = getKeccakStateOfBytes(paddedBlock); | ||
// xor the state with the padded block | ||
const stateXor = keccakStateXor(state, blockState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses more constraints than necessary, correct me if I'm wrong:
blockState
always consists of
rate / 64
words that come from the input message, followed bycapacity / 64
words which are all zero
For example, for length 256 we have 17 message words followed by 8 zero words.
XORing the 8 zero words with the current state will create a bunch of XOR gates, which are all no-ops, so they could just be left out.
This could be handled nicely by making Gadgets.xor()
do nothing if one of its inputs is the zero constant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XORing the 8 zero words with the current state will create a bunch of XOR gates, which are all no-ops, so they could just be left out.
Yes, this could be an optimization, leaving out those XORs and avoid extra rows that do nothing to the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by making Gadgets.xor() do nothing
if this is possible, then it sounds like a good idea to me
} | ||
|
||
// TODO(jackryanservia): Use lookup argument once issue is resolved | ||
// Checks in the circuit that a list of Fields are at most 8 bits each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly? I've looked carefully, and I think all words that are split up by keccakStateToBytes
and where the bytes are actually used, are constrained in the final checkBytes
call on the output in the hash()
function
But I fully agree that not having creation of bytes and checks on them co-located is confusing and error-prone.
src/lib/keccak.ts
Outdated
} | ||
|
||
// Obtain the hash selecting the first bitlength/8 bytes of the output array | ||
const hashed = outputArray.slice(0, length / 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in hash()
. but should be done in this function already
src/lib/keccak.ts
Outdated
const bytestring = keccakStateToBytes(state); | ||
const outputBytes = bytestring.slice(0, bytesPerSqueeze); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're converting the full state to bytes, and pay the constraints for doing that, but at the end you only use length / 8
of them.
for example, for length 256, you're converting 25 words to bytes but only use 4 of these words!
that's wasteful -- better to only convert the first length / 64
words of the state to bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to only convert the first length / 64 words of the state to bytes.
indeed, the new zkvm keccak only converts the needed 4 words at the end
Reminder to remove "coming soon" from the docs when this is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to merge, it's good as it stands. Constraint optimizations and addition of a 64-bit API can be done in another PR.
Implementation of the Keccak sponge and, preNist, ethereum, and SHA-3 hash functions.
This PR currently is using a workaround in
checkBytes
because lookup arguments are not working in pickles, but that will be updated as soon as a fix is available.