Skip to content

drtconway/permutation-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Permutation-rs

This library provides a constant space, approximately constant time method for producing permutations.

Rationale

There are many situations where we want to process items in a randomised order. The simplest way to do this is typically to "shuffle" a vector of items. However there are circumstances when this is undesirable - for instance, if the vector of items is read-only. In such a case, we could make a copy of it, but if it is very large, this may be impractical.

A special instance of when we want a permutation is sampling without replacement. We may have a very large vector from which we want to sample just a small number of items, we can avoid generating a full permutation by selecting random items and collecting them in a set data structure to avoid duplicates, until we have the required number of items, however this becomes inefficient as the number of items to be sampled grows and approaches the size of the original vector.

What we present here is an alternative approach which algorithmically generates the permutation without actually having to store it. The intuition is straightforward.

First, let us make the observation that for the purposes of generating a permutation, we don't need the items per se, just the indexes of the items: [0, n).

A block cipher uses a key to create a one-to-one mapping from plain-text blocks of b bits to encrypted blocks of b bits. It has to be a one-to-one mapping or we wouldn't be able to decrypt it! So in a simple case where n = 2**b, we could just use a b bit block cipher to encrypt indexes to generate a permutation. This way of generating a permutation also has the neat property that you can invert the permutation by decrypting.

Now if n < 2**b, it is possible that when we encrypt an index i, the encrypted value j could be greather than or equal to n. However, if j >= n, then it will never be a source index, so we can encypt j (with the same key), until we get an encrypted value less than n. Obviously if n << 2**b, we may have to do this many times. In fact, we can say the expected number of encryptions required till we get j < n is 2**b / n. So if we choose b such that 2**b is the smallest power of two >= n, then the average number of encryptions is less than 2.

Now this technique will work with any block cipher, but since we want to be flexible on the block size b, the most convenient choice is a Feistel Cipher. Feistel ciphers are built on top of a hash function. For a good cypher, a high quality hash function is required, but for our purposes any reasonable hash function will do. By default, the library will work with the Rust standard library default hash function, but it is easy to substitute in an alternative. Our Feistel cipher implementation is fairly simple and only works for block sizes where b is an even number. As a result, the expected number of encryption cycles is between 3 and 4, so it is slower than optimal, but this is still O(1), so it is satisfactory for our purposes.

Quick Start

To use this library in an existing project:

cargo add feistel-permutation-rs

To use it in your code:

let n = 1000000;
let seed = 29;
let perm = Permutation::new(n, seed, DefaultBuildHasher::new());
for i in 0..10 {
  println!("{} -> {}", i, perm.get(i));
}

Or if iterators are your thing:

let n = 1000000;
let seed = 29;
let perm = Permutation::new(n, seed, DefaultBuildHasher::new());
for j in perm.range(100, 200) {
  println!("{}", j);
}

for j in perm.iter().take(10) {
  println!("{}", j);
}

About

A Rust library for creating random access permutations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages