Use sha2-asm for feature compress on aarch64? #220

dgbo · 2021-01-26T07:40:29Z

Hi,

We met real performance issue when we tested the application lotus (dependent on sha2 with feature compress) on aarch64 platform.
The root cause is, for aarch64 only with feature compress, the sha2::compress256 does not use the aarch64 accelerators, e.g. sha256su0, it degradates to soft_compress.

We tried three ways to solve this.

A. Add feature compress, asm in applications for aarch64 when using sha2::compress256.
To keep x86 platforms using the intrinsics other than asm, modifications to Cargo.toml of applications would be:

+[target."cfg(target_arch = \"aarch64\")".dependencies]
+sha2 = { version = "0.9.1", features = [ "compress", "asm" ] }
+[target."cfg(target_arch = \"x86_64\")".dependencies]
+sha2 = { version = "0.9.1", features = [ "compress" ] }

However, this will enable both "compress" and "asm" no matter on aarch64 or x86.
This is a known BUG of cargo, see [1][2], the solution is still unstable.

B. Implement the instrinsic like we did for x86. We need core::arch::aarch64::* to do this.
It is still unstable for aarch64, we investigated the assembly generated for vsha256su0q_u32,
there are extra two memory loads before the actual sha256su0.
So as of now, the intrinsic can not deliver the best performance as the asm did.

C. Use sha2-asm for aarch64 with feature compress, like:

diff --git a/sha2/Cargo.toml b/sha2/Cargo.toml
index 98bd626..10f9ed1 100644
--- a/sha2/Cargo.toml
+++ b/sha2/Cargo.toml
@@ -35,6 +35,6 @@ hex-literal = "0.2"
 default = ["std"]
 std = ["digest/std"]
 asm = ["sha2-asm", "libc"]
-compress = [] # Expose compress function
+compress = ["sha2-asm", "libc"] # Expose compress function
 force-soft = [] # Force software implementation
 asm-aarch64 = ["asm"] # DEPRECATED: use `asm` instead
diff --git a/sha2/src/sha256.rs b/sha2/src/sha256.rs
index 3eef745..56c5c7a 100644
--- a/sha2/src/sha256.rs
+++ b/sha2/src/sha256.rs
@@ -142,7 +142,7 @@ cfg_if::cfg_if! {
     if #[cfg(feature = "force-soft")] {
         mod soft;
         use soft::compress;
-    } else if #[cfg(all(feature = "asm", target_arch = "aarch64", target_os = "linux"))] {
+    } else if #[cfg(all(any(feature = "compress", feature = "asm"), target_arch = "aarch64", target_os = "linux"))] {
         mod soft;
         mod aarch64;
         use aarch64::compress;

We want to fix the performance issue via C, it would be OK for both aarch64 and x86.
With it, the package is easy to import by applications and delivers same performance with feature compress and asm.

Suggestions?
Regards.

[1] rust-lang/cargo#2524
[2] rust-lang/cargo#7914

The text was updated successfully, but these errors were encountered:

newpavlov · 2021-01-26T08:03:44Z

IIUC your use case, you can use sha2-asm directly on aarch64 and fall back to sha2::compress256 on other targets:

# Cargo.toml
[target."cfg(not(target_arch = \"aarch64\"))".dependencies]
sha2 = { version = "0.9.1", features = [ "compress" ] }
[target."cfg(target_arch = \"aarch64\")".dependencies]
sha2-asm = "0.5"

// code
#[cfg(not(target_arch = "aarch64"))]
fn compress256(state: &mut [u32; 8], blocks: &[Block]) {
    sha2::compress256(state, blocks);
}

#[cfg(target_arch = "aarch64")]
fn compress256(state: &mut [u32; 8], blocks: &[Block]) {
    for block in blocks {
        sha2_asm::compress256(state, block);
    }
}

You will need some glue code to handle conversion between GenericArray<u8, U64> and [u8; 64], but nothing too difficult.

we investigated the assembly generated for vsha256su0q_u32, there are extra two memory loads before the actual sha256su0.

It could be worth to open an issue in the Rust language repository for that. The current assembly version is mostly intended as a temporary workaround around instability of the ARM intrinsics. Also note that for the best possible performance we probably have to work on RustCrypto/asm-hashes#26.

dgbo · 2021-01-26T09:01:07Z

IIUC your use case, you can use sha2-asm directly on aarch64 and fall back to sha2::compress256 on other targets:

# Cargo.toml
[target."cfg(not(target_arch = \"aarch64\"))".dependencies]
sha2 = { version = "0.9.1", features = [ "compress" ] }
[target."cfg(target_arch = \"aarch64\")".dependencies]
sha2-asm = "0.5"

// code
#[cfg(not(target_arch = "aarch64"))]
fn compress256(state: &mut [u32; 8], blocks: &[Block]) {
    sha2::compress256(state, blocks);
}

#[cfg(target_arch = "aarch64")]
fn compress256(state: &mut [u32; 8], blocks: &[Block]) {
    for block in blocks {
        sha2_asm::compress256(state, block);
    }
}

You will need some glue code to handle conversion between GenericArray<u8, U64> and [u8; 64], but nothing too difficult.

Correctly, this would solve my problem.
But as the glue code, we have to see if we have sha256 implemented by the running hardware first.
Seems we need the exact same code sha2::compress256 and all code in src/sha256/aarch64.rs as the glue code.
It's a little bit weird for me that we add the code we already have in a library into the applications.

we investigated the assembly generated for vsha256su0q_u32, there are extra two memory loads before the actual sha256su0.

It could be worth to open an issue in the Rust language repository for that. The current assembly version is mostly intended as a temporary workaround around instability of the ARM intrinsics.

Totally agree, it is definitely an issue.
As of now, I'm not sure the extra memory loads are instroduced by rust or llvm.
I am still working on it, this needs some time.
Before we have an aarch64 instrinsic, can we provide an easily-used workaround (maybe C)? :)

Also note that for the best possible performance we probably have to work on RustCrypto/asm-hashes#26.

Great idea! I think I can do it for aarch64.

newpavlov · 2021-01-26T09:37:23Z

Hm, I think a better solution would be to check availability of the SHA extension on x86 and use the intrinsic backend even if the asm feature is enabled, i.e. enabling it will simply replace the software fallback with the ASM one.

dgbo · 2021-01-27T01:39:09Z

Sounds reasonable. Is this OK?

--- a/sha2/src/sha256.rs
+++ b/sha2/src/sha256.rs
@@ -146,6 +146,10 @@ cfg_if::cfg_if! {
         mod soft;
         mod aarch64;
         use aarch64::compress;
+    } else if #[cfg(all(feature = "compress", any(target_arch = "x86", target_arch = "x86_64")))] {
+        mod soft;
+        mod x86;
+        use x86::compress;
     } else if #[cfg(all(feature = "asm", any(target_arch = "x86", target_arch = "x86_64")))] {
         fn compress(state: &mut [u32; 8], blocks: &[[u8; 64]]) {
             for block in blocks {

dgbo · 2021-01-28T07:34:29Z

Hm, I think a better solution would be to check availability of the SHA extension on x86 and use the intrinsic backend even if the asm feature is enabled, i.e. enabling it will simply replace the software fallback with the ASM one.

Hi, does the code below match your thoughts?

--- a/sha2/src/sha256.rs
+++ b/sha2/src/sha256.rs
@@ -146,6 +146,10 @@ cfg_if::cfg_if! {
         mod soft;
         mod aarch64;
         use aarch64::compress;
+    } else if #[cfg(all(feature = "compress", any(target_arch = "x86", target_arch = "x86_64")))] {
+        mod soft;
+        mod x86;
+        use x86::compress;
     } else if #[cfg(all(feature = "asm", any(target_arch = "x86", target_arch = "x86_64")))] {
         fn compress(state: &mut [u32; 8], blocks: &[[u8; 64]]) {
             for block in blocks {

The SHA extension on x86 is checked in x86::compress.
We use the x86 intrinsic backend even with -- features "compress, asm".
If only enabling asm, the ASM one is used as before.

With this, applications can just use --features "compress, asm" for x86 and aarch64, x86 stays the same as before to use the intrinsic backend, while aarch64 use the ASM one to better performance.
It would also be fine if only use --features "compress" for x86.

I'am sorry if I misunderstand what you mean.

Regards.

newpavlov mentioned this issue Jan 30, 2021

Use the SHA-NI extension backend with enabled asm feature #224

Merged

newpavlov closed this as completed in #224 Jan 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sha2-asm for feature compress on aarch64? #220

Use sha2-asm for feature compress on aarch64? #220

dgbo commented Jan 26, 2021

newpavlov commented Jan 26, 2021

dgbo commented Jan 26, 2021 •

edited

Loading

newpavlov commented Jan 26, 2021 •

edited

Loading

dgbo commented Jan 27, 2021

dgbo commented Jan 28, 2021 •

edited

Loading

Use sha2-asm for feature compress on aarch64? #220

Use sha2-asm for feature compress on aarch64? #220

Comments

dgbo commented Jan 26, 2021

newpavlov commented Jan 26, 2021

dgbo commented Jan 26, 2021 • edited Loading

newpavlov commented Jan 26, 2021 • edited Loading

dgbo commented Jan 27, 2021

dgbo commented Jan 28, 2021 • edited Loading

dgbo commented Jan 26, 2021 •

edited

Loading

newpavlov commented Jan 26, 2021 •

edited

Loading

dgbo commented Jan 28, 2021 •

edited

Loading