Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Keccak sponge #1291

Merged
merged 14 commits into from
Dec 12, 2023
Merged

Implement Keccak sponge #1291

merged 14 commits into from
Dec 12, 2023

Conversation

jackryanservia
Copy link
Contributor

@jackryanservia jackryanservia commented Dec 4, 2023

Implementation of the Keccak sponge and, preNist, ethereum, and SHA-3 hash functions.

This PR currently is using a workaround in checkBytes because lookup arguments are not working in pickles, but that will be updated as soon as a fix is available.

@jackryanservia jackryanservia marked this pull request as draft December 4, 2023 18:17
package.json Show resolved Hide resolved
src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved
src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved
src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved
@jackryanservia jackryanservia marked this pull request as ready for review December 6, 2023 15:40
@@ -1,106 +1,136 @@
import { Field } from './field.js';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have a full end to end functional Keccak, could you also add that (and some of the other variants, where you see fit) to the verification key regression tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1b

Copy link
Member

@querolita querolita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I left a few comments to ask for some clarifications regarding the exact constraints that are being added behind the scenes, but other than this it all looks good to me. Will be glad to approve as soon as we solve those concerns.

// AUXILARY FUNCTIONS

// Auxiliary function to check composition of 8 bytes into a 64-bit word
function checkBytesToWord(word: Field, wordBytes: Field[]): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must assume wordBytes is expected in little endian order then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is to behave as in getKeccakStateOfBytes for each of the words?


// Return a keccak state where all lanes are equal to 0
const getKeccakStateZeros = (): Field[][] =>
Array.from(Array(KECCAK_DIM), (_) => Array(KECCAK_DIM).fill(Field.from(0)));

// Converts a list of bytes to a matrix of Field elements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one does not create constraints for the byte length? Asking because keccakStateToBytes does. I guess this is because getKeccakStateOfBytes is only used at the beginning of Keccak where inputs are already constraints, but keccakStateToBytes is called at the end where you might want to check fresh bytes of the hash output?

src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Show resolved Hide resolved
return state;
}

// Converts a state of Fields to a list of bytes as Fields and creates constraints for it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay after reading I can see that the constraints here being referred to correspond to the decomposition of each word, but not the individual bytes composing each word. Wouldn't this need further calls to rangeCheck8 to constrain this is indeed the case? Or is this handled somewhere else in the code?

}

// TODO(jackryanservia): Use lookup argument once issue is resolved
// Checks in the circuit that a list of Fields are at most 8 bits each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay so the code indeed still needs some calls to checkBytes here and there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly? I've looked carefully, and I think all words that are split up by keccakStateToBytes and where the bytes are actually used, are constrained in the final checkBytes call on the output in the hash() function

But I fully agree that not having creation of bytes and checks on them co-located is confusing and error-prone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the final checkBytes call on the output in the hash() function

yeah, I saw that when I reached to the end of the review


// Check each Field input is 8 bits at most if it was not done before at creation time
if (byteChecks) {
checkBytes(messageFormatted);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against having the checkBytes option. We never put constraints like these on inputs to a gadget. Instead, message bytes should be range-checked by whatever function creates them.

the original PR by @MartinMinkov added a UInt8 class and made that the input type: #999

That was a nice solution, because the UInt8s would already be constrained when passed to a circuit or witnessed. so, there would be no (easy) way to even create a UInt8 that isn't range-checked (or constant, and thus, verifiably within a range).

src/lib/keccak.ts Show resolved Hide resolved
src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved
)(
(x) => {
let thing = x.map(Number);
let result = sha3_256(new Uint8Array(thing));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so this is using another implementation of the Keccaks to compare against what the gadget provides? Nice

I wonder if it could make sense to create tests using some of the test vectors proposed by the Keccak Team. Maybe not to be run inside CI all the time, but at least having run them once before shipping it for production.

Copy link
Collaborator

@mitschabaude mitschabaude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got about half of the way through, will continue review tomorrow morning

src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
Comment on lines +166 to +168
return byte;
})
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally we constrain values where they are created. so I would've expected a rangeCheck8 on the contents of bytestring here

src/lib/keccak.ts Outdated Show resolved Hide resolved
Comment on lines 138 to 142
for (let z = 0; z < BYTES_PER_WORD; z++) {
// Field element containing value 2^(8*z)
const shift = Field.from(2n ** BigInt(8 * z));
state[x][y] = state[x][y].add(shift.mul(wordBytes[z]));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it annoying how this duplicates the checkBytesToWord logic, and even with a different programming style (loop vs reduce)

Proposal: change checkBytesToWord(word: Field, wordBytes: Field[]) to bytesToWord(wordBytes: Field[]): Field and make it return the composed value. use it here:

state[x][y] = bytesToWord(wordBytes);

and add the extra assertion inline wherever you used checkBytesToWord before

src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@mitschabaude mitschabaude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not done reviewing but sending it now because I found a vulnerability.

Besides that, I wanted to call out: This keccak implementation handles a fixed-length input. Like, for example, the padding and number of sponge absorb calls is hard-coded based on the message length.

I assume many applications will want to hash variable-length messages. This needs non-trivial amount of extra work, so we might want to schedule that as a follow-up task. (cc @nicc)

// Create an array containing the 8 bytes starting on idx that correspond to the word in [x,y]
const word_bytes = bytestring.slice(idx, idx + BYTES_PER_WORD);
// Assert correct decomposition of bytes from state
checkBytesToWord(state[x][y], word_bytes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checkBytesToWord(state[x][y], word_bytes);
state[x][y].assertEquals(bytesToWord(word_bytes));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mitschabaude. Agreed that we should follow up with this. Have created a new issue here.

src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
Comment on lines 513 to 514
inpEndian: 'Big' | 'Little' = 'Big',
outEndian: 'Big' | 'Little' = 'Big',
Copy link
Collaborator

@mitschabaude mitschabaude Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the use of additional config options to support something as simple as reversing the input/output bytes. if there is a default for the endianness that is typically used with keccak, I would prefer if you just use that and remove the parameter (and document the endianness used at the interface). users who need the nonstandard endianness can just reverse the bytes themselves.

btw, little endian is good, big endian is bad. but if keccak usually takes big endian then we should go with it.


// Check each Field input is 8 bits at most if it was not done before at creation time
if (byteChecks) {
checkBytes(messageFormatted);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against having the checkBytes option. We never put constraints like these on inputs to a gadget. Instead, message bytes should be range-checked by whatever function creates them.

the original PR by @MartinMinkov added a UInt8 class and made that the input type: #999

That was a nice solution, because the UInt8s would already be constrained when passed to a circuit or witnessed. so, there would be no (easy) way to even create a UInt8 that isn't range-checked (or constant, and thus, verifiably within a range).

src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@mitschabaude mitschabaude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm done with the review, and this last round is mostly about efficiency.

There's a higher-level design question which also affects efficiency: Do all use cases need to pass the inputs as bytes? Because, if someone wanted to pass in data which already comes in 64-bit chunks, that would obviously be much more efficient, we could save all the bytes-to-words conversion and externally range checking the input bytes.

The same observation applies to the output, and there's definitely a use case which is more efficient when returning the output as 64-bit chunks: ECDSA.

For ECDSA, the output of this needs to become a 256-bit field element which is then represented as 3 88-bit limbs. So, if we return 4 64-bit words directly, we can convert them into limbs as follows:

input: words w0, w1, w2, w3
output: limbs l0, l1, l2

l0 = w0 | slice(w1, 0, 24)
l1 = slice(w1, 24, 64) | slice(w2, 0, 48)
l2 = slice(w2, 48, 64) | w3

so we can even use two output words directly without cutting them apart, and need to cut the middle two words into larger pieces which is much cheaper than into bytes.

therefore, my ask is to definitely make some version of this which returns full 64-bit words. for consistency, then, taking 64-bit words as input also makes sense.

maybe the following would be a useful collection of interfaces:

  • a low-level Keccak interface which takes and returns 64-bit fields
  • a high-level ECDSA interface which takes a byte string, does both Keccak and the EC part, and handles the word-to-limb conversion internally
  • helper functions which split an array of 64-bit fields into 8-bit fields and vice versa

this is just a suggestion - pick something you like with the constraint that it doesn't force us to go to intermediate bytes in ECDSA

src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
src/lib/keccak.ts Outdated Show resolved Hide resolved
);
const blockState = getKeccakStateOfBytes(paddedBlock);
// xor the state with the padded block
const stateXor = keccakStateXor(state, blockState);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses more constraints than necessary, correct me if I'm wrong:

blockState always consists of

  • rate / 64 words that come from the input message, followed by
  • capacity / 64 words which are all zero

For example, for length 256 we have 17 message words followed by 8 zero words.

XORing the 8 zero words with the current state will create a bunch of XOR gates, which are all no-ops, so they could just be left out.

This could be handled nicely by making Gadgets.xor() do nothing if one of its inputs is the zero constant!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XORing the 8 zero words with the current state will create a bunch of XOR gates, which are all no-ops, so they could just be left out.

Yes, this could be an optimization, leaving out those XORs and avoid extra rows that do nothing to the inputs.

Copy link
Member

@querolita querolita Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by making Gadgets.xor() do nothing

if this is possible, then it sounds like a good idea to me

}

// TODO(jackryanservia): Use lookup argument once issue is resolved
// Checks in the circuit that a list of Fields are at most 8 bits each
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly? I've looked carefully, and I think all words that are split up by keccakStateToBytes and where the bytes are actually used, are constrained in the final checkBytes call on the output in the hash() function

But I fully agree that not having creation of bytes and checks on them co-located is confusing and error-prone.

}

// Obtain the hash selecting the first bitlength/8 bytes of the output array
const hashed = outputArray.slice(0, length / 8);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in hash(). but should be done in this function already

Comment on lines 451 to 452
const bytestring = keccakStateToBytes(state);
const outputBytes = bytestring.slice(0, bytesPerSqueeze);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're converting the full state to bytes, and pay the constraints for doing that, but at the end you only use length / 8 of them.

for example, for length 256, you're converting 25 words to bytes but only use 4 of these words!

that's wasteful -- better to only convert the first length / 64 words of the state to bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to only convert the first length / 64 words of the state to bytes.

indeed, the new zkvm keccak only converts the needed 4 words at the end

@nicc
Copy link
Collaborator

nicc commented Dec 7, 2023

Reminder to remove "coming soon" from the docs when this is merged

Copy link
Collaborator

@mitschabaude mitschabaude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge, it's good as it stands. Constraint optimizations and addition of a 64-bit API can be done in another PR.

@jackryanservia jackryanservia merged commit ffce6b3 into main Dec 12, 2023
13 checks passed
@jackryanservia jackryanservia deleted the feature/keccak-sponge branch December 12, 2023 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants