Zhang Wang Fast Approximate Quantiles Algorithm in Rust: link to reference. It's the original algorithm inspired the techniques in Tensorflow Boosted Trees. You can find a quick context on my blog.
Add this to your Cargo.toml
:
[dependencies]
zw-fast-quantile = "0.2"
then you are good to go. If you are using Rust 2015 you have to extern crate zw-fast-quantile
to your crate root as well.
There are two implementations in this library: FixedSizeEpsilonSummary
is with the size of the stream known before hand, and UnboundEpsilonSummary
is for the stream with unknown size. You can adjust the error rate epsilon
of your own to trade-off between space and accuracy.
let epsilon = 0.1;
let n = 10;
let mut s = FixedSizeEpsilonSummary::new(n, epsilon);
for i in 1..=n {
s.update(i);
}
let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);
let epsilon = 0.1;
let n = 10;
let mut s = UnboundEpsilonSummary::new(epsilon);
for i in 1..=n {
s.update(i);
}
let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);
We benchmark against GK01 implemented in quantiles. To test the UPDATE, we insert 5000 values in order with error rate of 0.01. GK01 is roughly 2.6x slower than ZW.
zw unbound quantile update
time: [60.780 us 60.855 us 60.936 us]
change: [-1.4032% -0.9510% -0.5005%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
gk quantile update time: [156.84 us 157.02 us 157.24 us]
change: [-0.1907% -0.0503% +0.0969%] (p = 0.50 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
As for the query time, we query 11 quantiles from 0.0, 0.1 ... 1.0. GK01 is about 1.5x slower than ZW.
zw unbound quantile query
time: [229.62 ns 230.16 ns 230.77 ns]
change: [+1.3422% +1.8105% +2.2504%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
gk quantile query time: [350.21 ns 350.48 ns 350.76 ns]
change: [-0.4638% -0.3109% -0.1670%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
2 (2.00%) high mild
5 (5.00%) high severe