Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use REP MOVSB/STOSB when the ERMSB feature is present #392

Merged
merged 3 commits into from
Nov 3, 2020

Conversation

josephlr
Copy link
Contributor

@josephlr josephlr commented Nov 3, 2020

Followup to #365, part of addressing #339

The first commit reorganizes the mem functions to reduce the amount of platform specific code. Specifically, the following #[inline(always)] Rust functions:

  • impls::copy_forward
  • impls::copy_backward
  • impls::set_bytes

The second commit uses the functionality added in rust-lang/rust#78396 to check if the "ermsb" feature is enabled at compile-time (as we can't detect the features at runtime). If so, we use REP MOVSB/STOSB instead of REP MOVSQ/STOSQ. This is what Linux does.

The final commit adds memcpy/memset tests that are not 8-byte aligned.

Performance

See the "ERMSB" tab of this spreadsheet. For Intel chips with the "ermsb" feature, we get a 25%-28% performance improvement against the existing implementation, reaching parity with glibs's memcpy and memset.

Note this improvement only seems to help for small-ish (4 KiB) copies, 1 MiB copies show no effect from the change.

Also note that this feature doesn't exist on AMD CPUs, so we don't have to worry abound performance degradation on those chips.

This reduces the amount of platform-specific code

Signed-off-by: Joe Richey <[email protected]>
Signed-off-by: Joe Richey <[email protected]>
@alexcrichton
Copy link
Member

This looks great to me, thanks for this!

Out of curiosity, what x86 targets do we have that end up using this?

@alexcrichton alexcrichton merged commit 63c0091 into rust-lang:master Nov 3, 2020
@josephlr josephlr deleted the check branch November 3, 2020 22:00
@josephlr
Copy link
Contributor Author

josephlr commented Nov 3, 2020

Out of curiosity, what x86 targets do we have that end up using this?

My use-case is custom no_std x86 targets, I tested by just using target-cpu=native. I don't think very many of our builtin-targets use this (as the mem functions are usually provided by the libc).

I think the SGX targets, which do use these mem functions, could set +ermsb in their target features, similar to what they already do for +rdrnd and +rdseed. ERMSB was introduced in Ivy Bridge, while SGX was introduced in Skylake.

AaronKutch pushed a commit to AaronKutch/compiler-builtins that referenced this pull request Nov 28, 2020
* Reorganize mem functions

This reduces the amount of platform-specific code

Signed-off-by: Joe Richey <[email protected]>

* Use ERMSB implementations if the feature is set

Signed-off-by: Joe Richey <[email protected]>

* Add non-aligned benchmarks

Signed-off-by: Joe Richey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants