Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress documentation uploaded to S3 #379

Closed
pietroalbini opened this issue Jul 18, 2019 · 20 comments · Fixed by #780
Closed

Compress documentation uploaded to S3 #379

pietroalbini opened this issue Jul 18, 2019 · 20 comments · Fixed by #780

Comments

@pietroalbini
Copy link
Member

pietroalbini commented Jul 18, 2019

At the moment we don't compress the documentation uploaded to S3, wasting a lot of space. While money for S3 isn't an issue at the moment, avoiding compression could hurt docs.rs's sustainability in the future.

I ran some very rough benchmarks on reqwest 0.9.3, compressing each .html file separately:

Bad benchmark
Algorithm Size Compression time Decompression time Options
Plaintext 33.9 MB - - -
Gzip 12.0 MB 3.2s 3.4s -9 (best)
Gzip 12.8 MB 2.5s 3.4s -1 (fast)
Zstd 11.7 MB 7.8s 2.4s -19 (best)
Zstd 12.5 MB 2.5s 2.4s -1 (fast)
Brotli 11.5 MB 5.5s 2.3s -9 (best)
Brotli 13.0 MB 2.3s 2.3s -0 (fast)

Looking at the numbers, on average if we compress the uploaded docs we're going to save 63% of storage space, which is great from a sustainability point of view. I think we should compress all the uploaded docs going forward, and try to compress (part of) the initial import as well.

For the algoritm choice, I'd say we can go with gzip: there isn't much difference between the resulting sizes and the compression time delta between gzip's fast and best modes is the smallest. We can compress the initial import with -1 to speed it up, and all the new crates with -9.

cc @Mark-Simulacrum @QuietMisdreavus

Benchmark method

Installed compression tools on Ubuntu 18.04 LTS:

$ sudo apt install gzip brotli zstd

Downloaded locally the reqwest documentation:

$ aws s3 cp --recursive s3://rust-docs-rs/rustdoc/reqwest/0.9.3/ .

Compressed every .html file with find:

$ time find <dir> -name "*.html" -exec <command> {} \;
@Mark-Simulacrum
Copy link
Member

Could you populate decompression times as well? In some sense those are considerably more important for us to decide whether this is viable. Compression times for all of those are not great though :/

@GuillaumeGomez
Copy link
Member

@Mark-Simulacrum Why would we need decompression? You can send compressed files as is (I think only gzip compression is supported on web-browser though?).

@Mark-Simulacrum
Copy link
Member

That's true, in general, certainly. I guess presumably ~all clients are Accept-Encoding: gzip; or w/e the header value is?

I do think this shouldn't be that hard to add -- presumably we'd add a "compressed" column to the files table and if it's set access files in S3 at file.gzip or something along those lines.

@pietroalbini
Copy link
Member Author

Added decompression time: gzip is slower than zstd and brotli, and those two are roughly the same. Of course production performance for decompression are going to be way better, as you don't have to write to my hard disk (:sweat_smile:), everything is in memory, the binary is not restarted every time and we will decompress way less files.

Considering the decompression time brotli is probably the best choice?

Why would we need decompression? You can send compressed files as is (I think only gzip compression is supported on web-browser though?).

No, because docs.rs needs to tweak those HTML files (for example to add the top bar).

@Mark-Simulacrum
Copy link
Member

Given those decompression times, this seems problematic. Even if they were an order of magnitude lower, that's still ~250ms/file we read - and slowing down all requests to docs.rs by that much seems poor. In very unscientific benchmarking (web developer tools), it looks like we respond in about ~60ms on js/css content today and ~150ms on HTML content -- this would significantly increase that.

@pietroalbini
Copy link
Member Author

Let me create a proper benchmark.

@pietroalbini
Copy link
Member Author

Given those decompression times, this seems problematic. Even if they were an order of magnitude lower, that's still ~250ms/file we read

By the way, the numbers are for compressing and decompressing 2040 files, not a single file.

@Mark-Simulacrum
Copy link
Member

Ah, yeah, that makes sense. I did think the numbers were awfully large :)

We probably want single-file timing information as that's whats important for the primary use case of serving them.

@pietroalbini
Copy link
Member Author

Ok, scratch that, wrote a proper benchmark, and the results are way more accurate:

Algorithm Level Size Comp. Comp. (one) Dec. Dec. (one)
plain - 25.2 MB - - - -
gzip 9 3.8 MB 426.9ms 194.4µs 70.1ms 31.9µs
gzip 5 3.9 MB 259.7ms 118.3µs 71.0ms 32.3µs
gzip 1 5.0 MB 119.0ms 54.2µs 82.3ms 37.5µs
zstd 9 3.6 MB 348.2ms 158.6µs 28.8ms 13.1µs
zstd 5 3.8 MB 151.9ms 69.2µs 24.4ms 11.1µs
zstd 1 4.2 MB 68.3ms 31.1µs 24.9ms 11.3µs
brotli 9 3.2 MB 3.6s 1.7ms 49.7ms 22.6µs
brotli 5 3.3 MB 389.4ms 177.3µs 49.8ms 22.7µs
brotli 1 4.4 MB 97.2ms 44.2µs 51.3ms 23.4µs

@Mark-Simulacrum
Copy link
Member

Okay, those numbers look great! We can definitely afford microseconds of decompression time.

I'll investigate doing this legwork, though I'll keep the upload going in the meantime. I think re-compressing existing files and such can happen in parallel and over time if needed, not sure.

@pietroalbini
Copy link
Member Author

Did some more brotli benchmarks:

Algorithm Level Size Comp. Comp. (one) Dec. Dec. (one)
plain - 25.2 MB - - - -
brotli 6 3.3 MB 409.2ms 186.4µs 49.8ms 22.7µs
brotli 5 3.3 MB 383.8ms 174.8µs 49.9ms 22.7µs
brotli 4 3.6 MB 264.0ms 120.2µs 50.1ms 22.8µs
brotli 3 3.9 MB 178.0ms 81.0µs 47.6ms 21.7µs
brotli 2 4.1 MB 145.6ms 66.3µs 52.3ms 23.8µs
brotli 1 4.4 MB 97.1ms 44.2µs 51.1ms 23.3µs

Based on that, I think the two options we should consider are zstd 9 and brotli 5:

  • Both take roughly the same time to compress, zstd 9 is slightly faster though
  • zstd 9 takes half the time to decompress than brotli 5
  • brotli 5 uses 10% less storage

I'm sort of preferring brotli 5 as the saved storage is nice, but I don't feel too strongly about that.

@najamelan
Copy link

najamelan commented Sep 18, 2019

putting the cached content in an iframe avoids decompression and post processing. It would require an extra http request, but then since it's cached, it's possible to give it a unique name and cache control immutable so browsers never try to reload it if it's in cache. That might well make up for the extra requests.

@namibj
Copy link

namibj commented Apr 2, 2020

zstd supports dictionary compression, where you pre-create the dictionary for a collection of files. This is generally beneficial in the <1MB size range, and should IMO be used for this application.

@namibj
Copy link

namibj commented Apr 13, 2020

Preliminary results on just a few hundred crates (specifically, cargo doc on declarative-dataflow) suggest average compression ratios of >20 with ~0.6 B/cycle (Broadwell, single-threaded) decompression speed for the html files, (individually-handled) with a (precisely) 110KiB dictionary (shared between the ~59MiB (uncompressed) of html files). Testing recorded at < https://asciinema.org/a/kpxnXvV9fk6d7jagcIo3NvKZf>

@jyn514
Copy link
Member

jyn514 commented May 28, 2020

putting the cached content in an iframe avoids decompression and post processing. It would require an extra http request, but then since it's cached, it's possible to give it a unique name and cache control immutable so browsers never try to reload it if it's in cache. That might well make up for the extra requests.

Unfortunately it isn't this simple, see #679 (comment) for why iframes aren't a good option.

@Nemo157
Copy link
Member

Nemo157 commented May 28, 2020

Benchmarks based on those @pietroalbini linked above (code here), adding in zstd custom using a 10MB dictionary I generated from ~4G of docs that @namibj provided, compressing the winapi-0.3.8 docs

Algorithm Level Size Comp. Comp. (one) Dec. Dec. (one)
plain - 946.8 MB - - - -
gzip 9 244.1 MB 12.9s 88.3µs 2.6s 17.6µs
gzip 5 244.4 MB 10.1s 69.6µs 2.6s 17.7µs
gzip 1 283.3 MB 5.2s 35.7µs 2.9s 19.9µs
zstd 9 247.7 MB 32.2s 220.9µs 1.2s 8.4µs
zstd 5 251.8 MB 15.0s 102.8µs 1.2s 8.6µs
zstd 1 271.2 MB 2.9s 19.7µs 1.2s 8.4µs
zstd custom 9 50.5 MB 6.7s 46.1µs 246.3ms 1.7µs
zstd custom 5 61.7 MB 3.0s 20.4µs 271.6ms 1.9µs
zstd custom 1 84.3 MB 928.5ms 6.4µs 348.5ms 2.4µs
brotli 9 204.2 MB 21.6s 148.5µs 1.9s 12.7µs
brotli 5 205.7 MB 14.0s 96.0µs 1.9s 13.0µs
brotli 1 278.2 MB 3.7s 25.2µs 2.0s 14.0µs

@Kixiron
Copy link
Member

Kixiron commented May 28, 2020

What's the timing of generating dictionaries and how often does it have to happen?

@Nemo157
Copy link
Member

Nemo157 commented May 29, 2020

That dictionary took about a minute. Though if we can get a dictionary trainer that can handle more than 4GB of data to load the entire archive @namibj made I assume it'll take longer.

Preferably we would never generate new dictionaries. As soon as one is used to compress some data that then needs to be part of docs.rs forever. Maybe if in the future rustdoc completely changes how it generates documentation it'd be worth regenerating a new dictionary, but for any minor changes the learnt data should hopefully still be relevant (one idea might be to include docs from a range of old rustdoc versions to potentially reduce overfit on how the latest encodes its docs).

@jyn514
Copy link
Member

jyn514 commented May 29, 2020

I think we should also train on more crates than winapi, which is a little special. Maybe we could add an embedded crate like stm32f0 and a smaller crate like hexponent.

@Nemo157
Copy link
Member

Nemo157 commented May 29, 2020

winapi isn't actually in the training set I'm using, it was just used to compare the results, but yes some time should be spent finding a good training set to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants