Skip to content

Commit

Permalink
typos
Browse files Browse the repository at this point in the history
  • Loading branch information
baszalmstra committed Apr 30, 2024
1 parent eb465c0 commit 0f2dcdf
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions cep-16.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Sharded Repodata
# CEP for Sparse Repodata

We propose a new "repodata" format that can be sparsely fetched. That means, generally, smaller fetches (only fetch what you need) and faster updates of existing repodata (only fetch what has changed).

We also change the encoding from JSON to MSGPACK for faster decoding.

## Motivation

The current repodata format is a JSON file that contains all the packages in a given channel. Unfortunately, that means it grows with the number of packages in the channel. This is a problem for large channels like conda-forge, which has over 150,000+ packages. It becomes very slow to fetch, parse and update the repodata.
Expand Down Expand Up @@ -31,7 +33,7 @@ Finally, the implementation of JLAP is quite complex which makes it hard to adop

### ZSTD compression

A notable improvement is compressing the `repodata.json` with `zst` and serving that file. In practice this yields a file that is 20% smaller (20-30 Mb for large cases). Although this is still quite a big file its substantially smaller.
A notable improvement is compressing the `repodata.json` with `zst` and serving that file. In practice, this yields a file that is 1/5th the size (20-30 Mb for large cases). Although this is still quite a big file it's substantially smaller.

However, the file still contains all repodata in the channel. This means the file needs to be redownloaded every time anyone adds a single package (even if a user doesnt need that package).

Expand Down Expand Up @@ -73,7 +75,7 @@ The contents look like the following (written in JSON for readability):

The index is still updated regularly but the file does not increase in size with every package added, only when new package names are added which happens much less often.

For a large case (conda-forge linux-64) this files is 670kb at the time of writing.
For a large case (conda-forge linux-64) this file is 670kb at the time of writing.

We suggest serving the file with a short lived `Cache-Control` `max-age` header of 60 seconds to an hour but we leave it up to the channel administrator to set a value that works for that channel.

Expand Down

0 comments on commit 0f2dcdf

Please sign in to comment.