You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Considering the growth of data and Network accessibility required for downloading the data release bundles we want to split the data release output file in multiple batches.
May be restrict bundle sizes to 500MB. The compressed bundle size is ~ 2 GB for 350000 Samples (10 GB actual data)
The text was updated successfully, but these errors were encountered:
When I suggested we could "multipart" the zip files, was with the notion of increasingly larger data sets. The two big files (1 TSV + 1 Fasta) would be broken into smaller parts while getting compressed by the library, and are put back together into the same original 2 big files when the user decompresses them.
However, the use for the whole dataset seems limited, and this seems to me like a solution for a very specific set of users.
Furthermore, @joneubank's suggestion to generate Delta archives would remove the need to offer multipart archives entirely, which seems more desirable to me, both from infrastructural and operational perspectives, as well as in terms of usefulness.
Actually Delta archives is the solution to a very specific set of users. So I would but that on the back burner for now. So i think Splitting data sets will be our best way to go forward from here. Thanks.
Considering the growth of data and Network accessibility required for downloading the data release bundles we want to split the data release output file in multiple batches.
May be restrict bundle sizes to 500MB. The compressed bundle size is ~ 2 GB for 350000 Samples (10 GB actual data)
The text was updated successfully, but these errors were encountered: