Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin SHA256 hashing #6977

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Conversation

MatthewJohnHeath
Copy link

@MatthewJohnHeath MatthewJohnHeath commented Aug 9, 2024

Expose the functionality of Zig's std.crypto.hash.sha2.Sha256 through a pure-functional interface to Roc.

@lukewilliamboswell
Copy link
Collaborator

@MatthewJohnHeath I tried to push to your branch but I got permission denied. Made a PR in case that helps.

@MatthewJohnHeath
Copy link
Author

MatthewJohnHeath commented Aug 20, 2024 via email

@smores56
Copy link
Collaborator

The shape of this PR is a good start, but I'm seeing a potential issue if we add other cryptographic code to this module, which would be quite unsurprising. You have emptySha256, addBytes, and digest. The first is named based on the hash, but the second and third are nominally agnostic to the hash, but only work for SHA-256. Could you rename them to addBytesSha256 or something similar?

@smores56
Copy link
Collaborator

Otherwise, I'm attempting to get things running

@smores56
Copy link
Collaborator

Okay, I'm almost done getting this working, just need to coordinate the Zig and Roc builtin types.

@smores56
Copy link
Collaborator

The REPL worked for me! We'll still need some tests, but it should be good to go, above comments notwithstanding.

@smores56
Copy link
Collaborator

Also, there's no code currently to extract state from the new Digest256 opaque type. Could you add a function or something to do that? Maybe Crypt.digest256ToBytes : Digest256 -> List U8?

@MatthewJohnHeath
Copy link
Author

Also, there's no code currently to extract state from the new Digest256 opaque type. Could you add a function or something to do that? Maybe Crypt.digest256ToBytes : Digest256 -> List U8?

I was thinking of (only) having the type implement Eq. Can you think of use cases where the individual bytes would be needed?

@smores56
Copy link
Collaborator

smores56 commented Sep 2, 2024

I was thinking of (only) having the type implement Eq. Can you think of use cases where the individual bytes would be needed?

If I want to store the hash for future comparison. Not that using SHA-256 would be recommended over Argon2, but the idea holds. There are a few algorithms out there that use SHA-256 digest bytes for computation (Wikipedia), at least UUID v5 uses SHA-1 in that way.

@smores56
Copy link
Collaborator

smores56 commented Sep 9, 2024

Tests are currently failing. @MatthewJohnHeath do you have time to implement the above fixes any time soon? If not, I can help, but it's your PR, so I'll defer to you.

@MatthewJohnHeath
Copy link
Author

Tests are currently failing. @MatthewJohnHeath do you have time to implement the above fixes any time soon? If not, I can help, but it's your PR, so I'll defer to you.

@smores56 It's not really clear to me why the build is hanging (which seems to be what is happening). I will try a few things, but might need to ask for help

@MatthewJohnHeath
Copy link
Author

@smores56 It's building now. Hopefully I can do those last 2 renamings of functions you asked for tomorrow.

@MatthewJohnHeath
Copy link
Author

If this gets through the tests, would it be to merge and then do docs and tests? Or do those need to be there first?

@smores56
Copy link
Collaborator

It's always preferable to do at least testing, but ideally also docs, in the same PR. We'd usually avoid them if they're going to make the PR too big, or if they're blocking us merging an important change/bug fix. Since it's neither of those, I'd prefer if we can get them in this change. I'm happy to help with either the tests or docs!

Sha256 := { location : U64 }

## Represents the digest of soem data produced by the SHA256 cryptographic hashing function as an opaque type.
## `Digest256`implements the `Eq` ability.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Digest256(space)implements ...

also, consider adding an empty doc comment between these two doc comments, that'll follow the "one-line summary on top, context beneath" structure for docs.

Digest256 := { firstHalf : U128, secondHalf : U128 } implements [Eq]

## Returns a `Sha256` to which no data have been added.
emptySha256 : {} -> Sha256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend "An empty SHA-256 digest."

And it's "to which no data has been added." https://www.thesaurus.com/e/grammar/data-is-or-data-are/

Copy link
Author

@MatthewJohnHeath MatthewJohnHeath Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word shouldn't be "digest", I think. We should save that for the Digest256 returned after "finalize" has been called on the zig object.
I am struggling to find a good term to describe what the Sha256 object does. My mental model of it is as the state of the algorithm, but that doesn't seem useful wording here. Something like "An empty SHA-256 hasher", maybe?

I agree that data (EDIT)is in this context; I've had too much time in settings where I get corrected the other way.

import Result
import Str

## Represents, as an opaque type, the state of a SHA256 cryptographic hashing function, after some (or no) data have been added to the hash.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: it being an opaque type is implied, I don't think we need to explicitly mention that here.

smores56
smores56 previously approved these changes Sep 13, 2024
@smores56
Copy link
Collaborator

smores56 commented Oct 8, 2024

So I made progress on running things. Tests are running for the builtins, and there is some LLVM issue I'll have to investigate tomorrow:

smores@smortress:~/dev/roc$ cargo run --release --bin roc -- test crates/compiler/builtins/roc/Crypt.roc
    Finished release [optimized] target(s) in 0.16s
     Running `target/release/roc test crates/compiler/builtins/roc/Crypt.roc`
Call parameter type does not match function signature!
%list.RocList %"#arg2"
 ptr  %call_builtin = call i64 @roc_builtins.crypt.sha256AddBytes(i64 %"#arg1", %list.RocList nocapture nonnull readonly byval(%list.RocList) %"#arg2"), !dbg !1635


Function "Crypt_sha256AddBytes_ade2dc9e385e74c61c8d36210907131d7823a7fe8d06c7bd978c1f46a9d3830" failed LLVM verification in NON-OPTIMIZED build. Its content was:

define internal fastcc i64 @Crypt_sha256AddBytes_ade2dc9e385e74c61c8d36210907131d7823a7fe8d06c7bd978c1f46a9d3830(i64 %"#arg1", %list.RocList %"#arg2") !dbg !1634 {
entry:
  %call_builtin = call i64 @roc_builtins.crypt.sha256AddBytes(i64 %"#arg1", %list.RocList nocapture nonnull readonly byval(%list.RocList) %"#arg2"), !dbg !1635
  ret i64 %call_builtin, !dbg !1635
}
thread 'main' panicked at crates/compiler/gen_llvm/src/llvm/build.rs:5810:21:
😱 LLVM errors when defining function "Crypt_sha256AddBytes_ade2dc9e385e74c61c8d36210907131d7823a7fe8d06c7bd978c1f46a9d3830"; I wrote the full LLVM IR to "/tmp/nix-shell.7F740v/test.ll"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@smores56
Copy link
Collaborator

I've been distracted, but I'll be looking at this again today. I might have to ask for help from @bhansconnect since I'm not much of an LLVM guy, which seems to be the last issue here. Needless to say, @MatthewJohnHeath there's nothing left that you need to do on this PR. If you have any changes you want to make, feel free, but otherwise I'm planning on merging as soon as I fix the LLVM issue.

};

pub fn emptySha256() callconv(.C) Sha256 {
const allocation = utils.allocateWithRefcount(@sizeOf(sha2.Sha256), @alignOf(sha2.Sha256), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this allocated with a refcount?

const utils = @import("utils.zig");

const Sha256 = extern struct {
location: [*]u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra pointer indirection. I think we should just expose the zig sha256 directly. Cast it to an array probably so it can potentially be inline instead of requiring an allocation and extra indirection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the roc side can just make it a tuple of integers that end up being sized the same as the zig type.

Just add a comptime assert on the size here. That way on zig update we can correct. I think it is currently 112 bytes before padding. So a tuple of 7 u128s should work to represent it on the roc size. On this zig side, it should be a [7]u128 align(16)

}
CryptSha256Digest => {
arguments!(sha);
call_bitcode_fn(env, &[sha], bitcode::CRYPT_SHA256_DIGEST)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use call_list_bitcode_fn and put the list as the first argument to the function. I think that should fix the current bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants