-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impl Distribution<u8> for Alphanumeric #935
Conversation
Sampling a random alphanumeric string by collecting chars (that are known to be ASCII) into a String involves re-allocation as String is encoding to UTF-8, via the example: ```rust let chars: String = iter::repeat(()) .map(|()| rng.sample(Alphanumeric)) .take(7) .collect(); ``` I wanted to get rid of the clearly unnecessary re-allocations in my applications, so I needed to be able to access to the ASCII characters as simple bytes. It seems like that was already what was going on inside Alphanumeric however, it was just internal. This PR changes the `Distribution<char>` impl to provide `u8`s (which it generates internally) instead, and implements the previous `Distribution<char>` using it. One could then, for example, do this: ```rust let mut rng = thread_rng(); let bytes = (0..7).map(|_| rng.sample(ByteAlphanumeric)).collect(); let chars = unsafe { String::from_utf8_unchecked(bytes) }; ```
Per the test failure this is however blocking, as |
If there was a way to directly convert an ASCII char to I guess you could test the following but it's throwing considerably more work at the optimiser: let bytes = (0..7).map(|_| {
let mut buf = [0u8; 4];
let len = rng.sample(Alphanumeric).encode_utf8(&mut buf).len();
debug_assert_eq!(len, 1);
buf[0]
).collect(); |
You can use let bytes = (0..7).map(|_| rng.sample(Alphanumeric) as u8).collect(); Can you test whether this is efficient enough for you? If so, there's no need to extend the API. |
That would also work, though at least to me involving the mental/safety review overhead of |
API design is a tradeoff between convenience, simplicity, completeness, minimality and avoiding breaking changes. Adding two APIs to do essentially the same thing which is already a tangential aspect of a much larger library may not be justified (based on a single request and apparently low usage of |
Failing significant interest, I will close this with the recommendation that users needing this behaviour implement the distribution themselves or a workaround like above. This PR will remain open for comment until closer to the 0.8 release. |
I think it might make sense to make |
It's not really equivalent though, is it, @vks? Numeric distributions are usually able to deduce their type from arguments, There isn't a strong argument either way in this case, so either resolution is acceptable IMO. |
@dhardy for future reference, a Github search probably isn't sufficient to determine usage of an API, although I don't really have an alternative to offer here. The dropping of Granted the fix is pretty easy, although not immediately obvious. I wasn't actually aware you could cast a |
@abonander sorry to hear it wasn't so clear how to use the new I guess you missed the example in the docs? |
We also mention it in our update guide. Maybe it's not visible enough? |
I honestly glanced past it. The heading could be a bit more eye-catching, such as:
I wasn't aware that existed, actually. I see it's mentioned in the README but it's kind of buried in a wall of text. The change to the impl isn't really a satisfying solution to the initial problem though because it requires using What would be really convenient, performant, and easy to enforce as secure, would be an inherent method on impl Alphanumeric {
pub fn gen_string(&self, rng: &mut impl CryptoRng, len: usize) -> String {
// there's no real difference in implementation here
// the different names and bounds are just to make the user think twice about which RNG they're using
self.gen_string_insecure(rng, len)
}
/// An alternative to `gen_string` when the random string isn't being used for secure contexts, e.g. as random input for tests.
pub fn gen_string_insecure(&self, rng: &mut impl Rng, len: usize) -> String {
let bytes: Vec<u8> = self.sample_iter(rng).take(len).collect();
if cfg!(debug_assertions) {
// sanity-check the distribution in testing
String::from_utf8(bytes).expect("BUG: Alphanumeric generates invalid bytes")
} else {
unsafe { String::from_utf8_unchecked(bytes) }
}
}
} Projects could even use |
Out of curiosity, was there a specific reason to remove |
Type inference. Since conversion to char is easy supporting only u8 seems like the best choice. |
Sampling a random alphanumeric string by collecting chars (that are known to be ASCII) into a String involves emitting code for re-allocation as String is encoding to UTF-8, via the example:
I wanted to get rid of the clearly unnecessary re-allocation branches in my application, so I needed to be able to access to the ASCII characters as simple bytes. It seems like that was already what was going on inside Alphanumeric however, it was just internal.
This PR changes the
Distribution<char>
impl to provideu8
s (which it generated internally already) instead, and implements the previousDistribution<char>
using it. One could then for example do: