rand_chacha and reducing dependencies #872

dhardy · 2019-08-28T11:05:03Z

This is basically forked from #850: a request to reduce the number of dependencies. This issue is opened for discussion since previously it did not have much. Also see: #667, #789.

As can be seen, this crate has several dependencies of its own, and is also the only required dependency of rand (besides getrandom, rand_core and on some platforms libc):

    ├── rand_chacha v0.2.1
    │   ├── c2-chacha v0.2.2
    │   │   ├── lazy_static v1.4.0
    │   │   └── ppv-lite86 v0.2.5
    │   └── rand_core v0.5.0
    │       └── getrandom v0.1.11 (*)

Probable resolution: no change

We don't need to change anything.

Possible resolution: subsume dependencies

We could copy some/all dependencies up into rand_chacha. Note that currently c2-chacha has no other dependent crates, while ppv-lite86 has two (both only used by the powhasher miner, as far as I can tell).

CC @kazcw

Other resolutions

We could revert back to HC-128 which has a fast implementation without use of SIMD intrinsics.

We could revert to the old ChaCha implementation or use the chacha20 crate, however this would not be good for performance.

The text was updated successfully, but these errors were encountered:

newpavlov · 2019-08-29T03:06:20Z

I think @tarcieri plans to implement SIMD acceleration for chacha20 crate. After it will be done we could make dependency on stream-cipher optional to reduce number of dependencies when used from rand_chacha. We even could implement BlockRngCore directly by adding an optional dependency on rand_core.

BTW in RustCrypto we were thinking about introducing a generic wrapper à la BlockRng. Maybe it's worth to revisit the suggestion of moving the block stuff outside of rand_core? This wrapper could implement both stream-cipher and rand_core traits behind feature gates and we will not have any code duplication.

This way dependency tree will look like:

├── rand_chacha
│   └── chacha20
│       └── block-stuff-crate
│           ├── rand_core
│           │   └── getrandom
│           └── stream-cipher (optional, disabled by default)

UPD: We also could deprecate rand_chacha and use chacha20 or c2-chacha directly, assuming they will implement rand_core traits.

tarcieri · 2019-08-29T03:27:11Z

SIMD support in the chacha20 crate would take me ~1 day. The algorithm is trivial.

I really like the idea of having a "raw" API for the ChaCha20 core and making the rest of the crate dependencies optional (but default-enabled).

dhardy · 2019-08-29T08:47:04Z

Having another fast implementation of ChaCha20 does give us options.

Does it not still suffer the problem of language SIMD support being experimental?

newpavlov · 2019-08-29T09:02:15Z

Well, x86(-64) intrinsics have been stable for a relatively long time (since 1.27), so I wouldn't call them "experimental". One noticeable issue is the absence of is_..._feature_detected! macros in core (see rust-lang/rfcs#2725). But since initially we will support only x86(-64) (since intrinsics for other arches are still only available in Nightly), it shouldn't be a problem. We can use an approach similar to one we use for RDRAND in getrandom, i.e. manual call to CPUID and cashing target feature check result in an atomic variable.

tarcieri · 2019-08-29T14:13:17Z

Does it not still suffer the problem of language SIMD support being experimental?

Just to +1 @newpavlov here, while there are a lot of unstable parts of core::arch::x86/x86_64, SIMD ChaCha20 can be implemented entirely in terms of the stable APIs.

dhardy · 2019-10-04T07:20:23Z

BTW in RustCrypto we were thinking about introducing a generic wrapper à la BlockRng. Maybe it's worth to revisit the suggestion of moving the block stuff outside of rand_core? This wrapper could implement both stream-cipher and rand_core traits behind feature gates and we will not have any code duplication.

BlockRng is the basis for minimal overhead in ReseedingRng and fork protection. If you propose removing that from rand_core, please also propose how reseeding should be implemented.

bjorn3 · 2019-10-10T16:19:43Z

The c2-chacha crate hasn't been updated for 2 months and has two PRs for 2 cq 5 months. It also doesnt work without at least SSE2 when the simd feature is enabled. (enabled by default) cryptocorrosion/cryptocorrosion#25

tarcieri · 2019-10-10T16:37:55Z

I'd be happy to look into adding SIMD support which "just works" and isn't gated on a feature to the chacha20 crate, or if anyone else would like to contribute it, I can review / merge PRs and release it.

It's definitely on my list of things to do (as part of starting to use the chacha20poly1305 crate in earnest and I hope to get around to it soon if no one else does.

dhardy · 2019-10-10T19:09:53Z

That would be brilliant @tarcieri.

In the mean-time it does appear that @kazcw and the whole cryptocorrosion project has gone silent, but I am not aware of any significant issues other than the build issue with Cranelift mentioned by @bjorn3 (hopefully something you can work around for now?), thus I suggest we wait for the SIMD work on chacha20. If necessary we can switch the implementation, though in complaince with our portability rules we should stick with ChaCha20 for the Rand 0.7.x series.

bjorn3 · 2019-10-10T19:30:25Z

but I am not aware of any significant issues other than the build issue with Cranelift mentioned by @bjorn3 (hopefully something you can work around for now?)

Patching c2-chacha works.

tarcieri · 2019-10-20T03:34:04Z

We just landed SSE support in the chacha20 crate, resulting in a ~25% performance improvement (I measure it as going from 6.7 cycles/byte to 5.3) and I plan on cutting another release in the next day or so:

RustCrypto/stream-ciphers#61

We'll hopefully be adding an AVX2 backend soon as well.

dhardy · 2019-10-20T11:21:39Z

I did a little hacking, for testing purposes.

This lets me run comparative benchmarks (std and thread are using the chacha20 crate; chacha benches are using c2-chacha via rand_chacha):

test gen_bytes_std           ... bench:   1,961,243 ns/iter (+/- 55,531) = 522 MB/s
test gen_u32_std             ... bench:       8,207 ns/iter (+/- 256) = 487 MB/s
test gen_u64_std             ... bench:      15,804 ns/iter (+/- 386) = 506 MB/s

test thread_rng_u32          ... bench:       8,667 ns/iter (+/- 752) = 461 MB/s
test thread_rng_u64          ... bench:      16,137 ns/iter (+/- 1,489) = 495 MB/s

test reseeding_chacha20_16k  ... bench:   9,884,385 ns/iter (+/- 183,632) = 1697 MB/s
test reseeding_chacha20_1M   ... bench:   9,223,385 ns/iter (+/- 1,112,501) = 1818 MB/s
test reseeding_chacha20_256k ... bench:   9,207,397 ns/iter (+/- 234,138) = 1822 MB/s
test reseeding_chacha20_32k  ... bench:   9,577,018 ns/iter (+/- 312,612) = 1751 MB/s
test reseeding_chacha20_4k   ... bench:  12,100,123 ns/iter (+/- 365,937) = 1386 MB/s
test reseeding_chacha20_64k  ... bench:   9,437,745 ns/iter (+/- 223,226) = 1777 MB/s

test gen_bytes_chacha12      ... bench:     404,481 ns/iter (+/- 23,979) = 2531 MB/s
test gen_bytes_chacha20      ... bench:     610,738 ns/iter (+/- 42,359) = 1676 MB/s
test gen_bytes_chacha8       ... bench:     302,190 ns/iter (+/- 64,398) = 3388 MB/s
test gen_u32_chacha12        ... bench:       1,949 ns/iter (+/- 66) = 2052 MB/s
test gen_u32_chacha20        ... bench:       2,506 ns/iter (+/- 96) = 1596 MB/s
test gen_u32_chacha8         ... bench:       1,535 ns/iter (+/- 132) = 2605 MB/s
test gen_u64_chacha12        ... bench:       4,129 ns/iter (+/- 131) = 1937 MB/s
test gen_u64_chacha20        ... bench:       5,701 ns/iter (+/- 252) = 1403 MB/s
test gen_u64_chacha8         ... bench:       3,367 ns/iter (+/- 84) = 2376 MB/s

As you can see, there's a long way to go to catch up. One possible part of this is that chacha20 uses [u32; 16] blocks returned via copy while the rand_chacha crate uses a [u32; 64] output block filled via reference.

These benches include 8- and 12-round versions of ChaCha, which in my opinion make for excellent RNGs where security is less important. The code in the chacha20 crate would need some refactoring in order to support these alternatives.

Another thing to think about is the title of this thread: and reducing dependencies (which, as we all know, is not a very useful metric but never-the-less one that many users bring up). This is a cut from the cargo tree output for rand with this hack:

rand v0.7.2 (/home/dhardy/projects/rand/rand)
├── chacha20 v0.2.1 (/home/dhardy/projects/rand/stream-ciphers/chacha20)
│   ├── byteorder v1.3.2
│   ├── rand_core v0.5.1 (/home/dhardy/projects/rand/rand/rand_core)
│   │   └── getrandom v0.1.12
│   │       ├── cfg-if v0.1.9
│   │       └── libc v0.2.62
│   ├── salsa20-core v0.2.2 (/home/dhardy/projects/rand/stream-ciphers/salsa20-core)
│   │   └── stream-cipher v0.3.2
│   │       └── generic-array v0.12.3
│   │           └── typenum v1.11.2
│   │   [dev-dependencies]
│   │   └── stream-cipher v0.3.2 (*)
│   └── stream-cipher v0.3.2 (*)

Thanks for the optimisation work @tarcieri, but I am skeptical about switching to chacha20 over c2-chacha.

tarcieri · 2019-10-20T16:35:14Z

Thanks for looking into it. We can probably reverse the relationship between the chacha20 crate and the salsa20-core crate, so the latter isn't needed when the construction isn't using the traits from the stream-cipher crate.

We've also talked about operating on several blocks in parallel as part of adding an AVX2 backend.

dhardy · 2019-10-21T08:49:21Z

I noticed several constants being pulled from the salsa20-core crate; if those are copied then the remaining dependency could probably be optional (i.e. opt-in to the RNG, opt-in to the cipher). That may be the simplest solution there?

tarcieri · 2019-10-21T14:21:25Z

Yup, we can potentially have both salsa20-core (and vicariously stream-cipher) as well as rand_core as optional dependencies

tarcieri · 2019-10-21T15:04:19Z

Also re: the reduced round variants i.e. ChaCha8/ChaCha12, yes, we can potentially add those, probably gated under an off-by-default Cargo feature.

kazcw · 2019-10-21T19:17:28Z

I eliminated the lazy_static dependency:

rand_chacha v0.2.1
├── c2-chacha v0.2.3
│   └── ppv-lite86 v0.2.6
└── rand_core v0.5.1

kazcw · 2019-10-21T19:19:00Z

The API surface coupling c2-chacha to ppv-lite86 dwarfs that coupling rand_chacha to c2-chacha (a SIMD library vs a hash function), so moving it between workspaces would lead to maintenance headaches. If package count is more important than logic-abiding modularity, I see a couple things we could do: I could move the core implementation from c2-chacha into ppv-lite86 ("have some chacha with your SIMD"), and we could move the contents of rand_chacha into rand so that rand would depend only on ppv-lite86--as minimal a dependency tree as could sanely be obtained without turning to a crate-bundling script. rand_chacha would depend on rand and pub-export the chacha rngs (which would technically be pub in rand, but could be in a namespace called private_dont_look). If that's worth doing then the cargo system is broken, but I don't see any lower-hanging fruit in the dependency-count-reduction field.

kazcw · 2019-10-21T19:37:59Z

IMO the more reasonable of those two dependency eliminations (if we were to stop at one) would be splaying the rand/rand_chacha dependency--it would be a quirk for rand maintainers that the rand_chacha code actually lives in rand, but forcing rand on rand_chacha users seems like it subverts the crate concept less than forcing a chacha backend on ppv-lite86 users.

vks · 2019-10-22T08:15:35Z

I eliminated the lazy_static dependency:

rand_chacha v0.2.1
├── c2-chacha v0.2.3
│   └── ppv-lite86 v0.2.6
└── rand_core v0.5.1

I think this is good enough! If anything, we could get rid of rand_chacha and add rand_core support to c2-chacha behind a feature flag.

dhardy · 2019-10-22T08:46:13Z

Thanks @kazcw for taking another look at this and removing the lazy_static dependency.

We have in the past duplicated the OsRng code to avoid a dependency, and have at least discussed reversing dependencies (rand_chacha depending on rand), but neither is desirable.

If that's worth doing then the cargo system is broken, but I don't see any lower-hanging fruit in the dependency-count-reduction field.

I agree; actually this is less about "Cargo is broken" than "people's metrics of complexity are broken".

At this point my preference is actually no further changes, except possibly the one @vks mentions: drop rand_chacha and add optional rand_core integration within c2-chacha behind a feature flag (approx 500 extra lines to a 900 line crate; in contrast ppv-lite has over 3500 lines and rand over 9000).

dhardy · 2020-01-13T17:47:32Z

drop rand_chacha and add optional rand_core integration within c2-chacha behind a feature flag

In fact I believe this is the wrong approach; rather we should copy the c2-chacha code we need into rand_chacha and depend directly on ppv-lite86. Reasons are two:

rand_chacha depends on only about 300 lines (1/3) of the code in c2-chacha
rand_chacha already has 44 reverse-dependencies; c2-chacha has 2

tarcieri · 2020-01-13T18:53:09Z

@dhardy while you’re looking into that, you might consider using ChaCha12 over ChaCha20:

https://eprint.iacr.org/2019/1492

ChaCha12 still provides 5 rounds of safety margin over the best known attack, whereas ChaCha20 provides 13 which is excessive when the best attack is 7 (and even then that attack doesn’t result in a catastrophic breakage).

The cited paper suggests ChaCha8 which only provides a single round of safety margin. I think ChaCha12 is at a happy medium of enough rounds to feel confident it’s highly unlikely to be broken by future attacks, and improving the performance through a lower number of rounds.

dhardy · 2020-01-13T19:05:53Z

@tarcieri sounds great! But considering that is a security issue, I think it deserves its own thread.
So maybe copy+paste that to a new issue?

Also, it's not "while you're looking into that". rand_chacha already exports all three common variants; this change would be to Rand's main lib.

tarcieri · 2020-01-13T19:20:57Z

Sure, I can open a separate issue (edit: opened #932)

tarcieri · 2020-01-16T16:48:55Z

I'm back, with something actually relevant to this thread.

I just released v0.3 of the chacha20 crate. It includes the following changes:

All of its dependencies (stream-cipher, rand_core, and zeroize) are now optional, and the rng feature now depends exclusively on the rand_core crate. With chacha20 = { version = "0.3", default-features = false, features = ["rng"] }, the only dependency is the rand_core crate.
The buffering logic has been dramatically improved. The RNG no longer performs any buffering whatsoever.
ChaCha8 / ChaCha8Rng / ChaCha8RngCore and ChaCha12 / ChaCha12Rng / ChaCha12RngCore are now available.
It now supports an AVX2 backend (implemented in terms of core::arch intrinsics) which brings its performance to ~1.4 cycles/byte when using that backend.

dhardy added the E-question Participation: opinions wanted label Aug 28, 2019

dhardy mentioned this issue Aug 28, 2019

RFC: remove dependency on rand ecosystem BurntSushi/quickcheck#241

Closed

This was referenced Sep 13, 2019

Key erasure AES variant #299

Open

Tracker: 0.8 release #889

Closed

dhardy mentioned this issue Jan 13, 2020

Chacha: integrate c2-chacha dependency #931

Merged

dhardy mentioned this issue Jan 17, 2020

Replacing rand_chacha with chacha20 #934

Open

dhardy closed this as completed in #931 Jan 20, 2020

vks added this to the 0.8 release milestone Aug 31, 2020

nstilt1 mentioned this issue Nov 7, 2023

chacha20: returning rand_core feature RustCrypto/stream-ciphers#333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rand_chacha and reducing dependencies #872

rand_chacha and reducing dependencies #872

dhardy commented Aug 28, 2019

newpavlov commented Aug 29, 2019 •

edited

Loading

tarcieri commented Aug 29, 2019

dhardy commented Aug 29, 2019

newpavlov commented Aug 29, 2019 •

edited

Loading

tarcieri commented Aug 29, 2019

dhardy commented Oct 4, 2019

bjorn3 commented Oct 10, 2019

tarcieri commented Oct 10, 2019

dhardy commented Oct 10, 2019

bjorn3 commented Oct 10, 2019

tarcieri commented Oct 20, 2019

dhardy commented Oct 20, 2019 •

edited

Loading

tarcieri commented Oct 20, 2019

dhardy commented Oct 21, 2019

tarcieri commented Oct 21, 2019

tarcieri commented Oct 21, 2019 •

edited

Loading

kazcw commented Oct 21, 2019

kazcw commented Oct 21, 2019

kazcw commented Oct 21, 2019 •

edited

Loading

vks commented Oct 22, 2019

dhardy commented Oct 22, 2019

dhardy commented Jan 13, 2020

tarcieri commented Jan 13, 2020 •

edited

Loading

dhardy commented Jan 13, 2020

tarcieri commented Jan 13, 2020 •

edited

Loading

tarcieri commented Jan 16, 2020 •

edited

Loading

rand_chacha and reducing dependencies #872

rand_chacha and reducing dependencies #872

Comments

dhardy commented Aug 28, 2019

Probable resolution: no change

Possible resolution: subsume dependencies

Other resolutions

newpavlov commented Aug 29, 2019 • edited Loading

tarcieri commented Aug 29, 2019

dhardy commented Aug 29, 2019

newpavlov commented Aug 29, 2019 • edited Loading

tarcieri commented Aug 29, 2019

dhardy commented Oct 4, 2019

bjorn3 commented Oct 10, 2019

tarcieri commented Oct 10, 2019

dhardy commented Oct 10, 2019

bjorn3 commented Oct 10, 2019

tarcieri commented Oct 20, 2019

dhardy commented Oct 20, 2019 • edited Loading

tarcieri commented Oct 20, 2019

dhardy commented Oct 21, 2019

tarcieri commented Oct 21, 2019

tarcieri commented Oct 21, 2019 • edited Loading

kazcw commented Oct 21, 2019

kazcw commented Oct 21, 2019

kazcw commented Oct 21, 2019 • edited Loading

vks commented Oct 22, 2019

dhardy commented Oct 22, 2019

dhardy commented Jan 13, 2020

tarcieri commented Jan 13, 2020 • edited Loading

dhardy commented Jan 13, 2020

tarcieri commented Jan 13, 2020 • edited Loading

tarcieri commented Jan 16, 2020 • edited Loading

newpavlov commented Aug 29, 2019 •

edited

Loading

newpavlov commented Aug 29, 2019 •

edited

Loading

dhardy commented Oct 20, 2019 •

edited

Loading

tarcieri commented Oct 21, 2019 •

edited

Loading

kazcw commented Oct 21, 2019 •

edited

Loading

tarcieri commented Jan 13, 2020 •

edited

Loading

tarcieri commented Jan 13, 2020 •

edited

Loading

tarcieri commented Jan 16, 2020 •

edited

Loading