Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize decimal formatting of 128-bit integers #81484

Merged
merged 1 commit into from
Jan 31, 2021

Conversation

Kogia-sima
Copy link
Contributor

Description

This PR optimizes the udivmod_1e19 function, which is used for formatting 128-bit integers, based on the algorithm provided in [1]. This optimization improves performance of formatting 128-bit integers, especially on 64-bit architectures. It also slightly reduces the output binary size.

Assembler comparison

https://godbolt.org/z/YrG5zY

Performance

previous results

test fmt::write_u128_max                                        ... bench:         552 ns/iter (+/- 4)
test fmt::write_u128_min                                        ... bench:         125 ns/iter (+/- 2)

new results

test fmt::write_u128_max                                        ... bench:         205 ns/iter (+/- 13)
test fmt::write_u128_min                                        ... bench:         129 ns/iter (+/- 5)

Reference

[1] T. Granlund and P. Montgomery, “Division by Invariant Integers Using Multiplication” in Proc. of the SIGPLAN94 Conference on Programming Language Design and Implementation, 1994, pp. 61–72

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 28, 2021
@nagisa
Copy link
Member

nagisa commented Jan 29, 2021

r? @nagisa

@rust-highfive rust-highfive assigned nagisa and unassigned dtolnay Jan 29, 2021
((q << 1) | carry as u128, r as u64)
const FACTOR: u128 = 156927543384667019095894735580191660403;

let quot = if n < 1 << 83 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an actual value in this condition? I think on majority of the targets we support u128_mulhi will be faster than a 64-bit division anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course if you're using laptop/desktop PC u128_mulhi works very fast and thus this branch may be needless. However integer multiplication is still expensive operation on some modern processors. For example, on the Intel Knights Landing micro architecture (which is widely used for supercomputers and high-performance workstations), MUL instruction needs >7 cycles for generating result. Also it has only 1 specific port for multiplication.

https://agner.org/optimize/

In addition, As you can see in assembler output, this conditional branch generates only 2 instructions, and second call to this function always fallback to fast path because u128::MAX / 10^19 < 2^83. That means, if fast path exists, then these 2 calls results in 44 instructions in total, 58 otherwise.

@nagisa
Copy link
Member

nagisa commented Jan 29, 2021

Broadly LGTM, pending answer to the q above.

@nagisa
Copy link
Member

nagisa commented Jan 30, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Jan 30, 2021

📌 Commit ada714d has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2021
m-ou-se added a commit to m-ou-se/rust that referenced this pull request Jan 30, 2021
… r=nagisa

Optimize decimal formatting of 128-bit integers

## Description

This PR optimizes the `udivmod_1e19` function, which is used for formatting 128-bit integers, based on the algorithm provided in \[1\]. This optimization improves performance of formatting 128-bit integers, especially on 64-bit architectures. It also slightly reduces the output binary size.

## Assembler comparison

https://godbolt.org/z/YrG5zY

## Performance

#### previous results

```
test fmt::write_u128_max                                        ... bench:         552 ns/iter (+/- 4)
test fmt::write_u128_min                                        ... bench:         125 ns/iter (+/- 2)
```

#### new results

```
test fmt::write_u128_max                                        ... bench:         205 ns/iter (+/- 13)
test fmt::write_u128_min                                        ... bench:         129 ns/iter (+/- 5)
```

## Reference

\[1\] T. Granlund and P. Montgomery, “Division by Invariant Integers Using Multiplication” in Proc. of the SIGPLAN94 Conference on Programming Language Design and Implementation, 1994, pp. 61–72
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 31, 2021
…as-schievink

Rollup of 18 pull requests

Successful merges:

 - rust-lang#78044 (Implement io::Seek for io::Empty)
 - rust-lang#79285 (Stabilize Arc::{increment,decrement}_strong_count)
 - rust-lang#80053 (stabilise `cargo test -- --include-ignored`)
 - rust-lang#80279 (Implement missing `AsMut<str>` for `str`)
 - rust-lang#80470 (Stabilize by-value `[T; N]` iterator `core::array::IntoIter`)
 - rust-lang#80945 (Add Box::downcast() for dyn Any + Send + Sync)
 - rust-lang#81048 (Stabilize `core::slice::fill_with`)
 - rust-lang#81198 (Remove requirement that forces symmetric and transitive PartialEq impls to exist)
 - rust-lang#81422 (Account for existing `_` field pattern when suggesting `..`)
 - rust-lang#81472 (Clone entire `TokenCursor` when collecting tokens)
 - rust-lang#81484 (Optimize decimal formatting of 128-bit integers)
 - rust-lang#81491 (Balance sidebar `Deref` cycle check with main content)
 - rust-lang#81509 (Add a regression test for ICE of bad_placeholder_type)
 - rust-lang#81547 (Edit rustc_typeck top-level docs)
 - rust-lang#81550 (Replace predecessor with range in collections documentation)
 - rust-lang#81558 (Fix ascii art text wrapping in mobile)
 - rust-lang#81562 (Clarify that InPlaceIterable guarantees extend to all advancing iterator methods.)
 - rust-lang#81563 (Improve docblock readability on small screen)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit fd868d0 into rust-lang:master Jan 31, 2021
@rustbot rustbot added this to the 1.51.0 milestone Jan 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants