Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nupkg compression is bad #890

Closed
ericstj opened this issue Jul 5, 2015 · 16 comments
Closed

Nupkg compression is bad #890

ericstj opened this issue Jul 5, 2015 · 16 comments
Labels
Type:DCR Design Change Request
Milestone

Comments

@ericstj
Copy link

ericstj commented Jul 5, 2015

Currently nupkg compression doesn't seem to de-dup multiple files. It also does a poor job at compressing XML. We doubled the size of our packages when we added localized docs. I haven't done a ton of investigation here but I bet there are a bunch of things nuget could do better.

@yishaigalatzer
Copy link

What tools are you using to pack?

@ericstj
Copy link
Author

ericstj commented Jul 6, 2015

nuget.exe pack on a nuspec

@yishaigalatzer yishaigalatzer added this to the 3.2.0-Beta milestone Jul 6, 2015
@yishaigalatzer yishaigalatzer modified the milestones: Client-VNext, 3.3.0-Beta Oct 1, 2015
@yishaigalatzer
Copy link

NuGet is currently using the packaging APIs rather than ZipArchive. There is a possible potential for improvements, but this doesn't seem to be high on the immediate to do list

@yishaigalatzer yishaigalatzer added the Type:DCR Design Change Request label Oct 1, 2015
@ericstj
Copy link
Author

ericstj commented Oct 1, 2015

I tested ZipArchive when I created this bug and didn't find much benefit. I also tried some other zip utilities with different compression quality and didn't see much benefit.

LZMA on the binaries showed some major wins ~2/3 the size of zip, much greater when files were very similar as would be the case for cross-compiled implementations. Also potentially using other compression tech for XML could provide better wins there. I didn't get any chance to try it but there is a new standard for XML compression http://www.w3.org/XML/EXI/.

Suppose the XML is represented as a EXI compressed format within the container and then the entire container has LZMA. I think that'd be a significant savings.

EXI might even be something to look at for the docs on disk, assuming we could get VS support.
/cc @davidfowl

@yishaigalatzer
Copy link

The down side of fancier compressions means breaking compat with older clients.

Sent from my Windows Phone


From: Eric StJohnmailto:[email protected]
Sent: ‎10/‎1/‎2015 4:04 PM
To: NuGet/Homemailto:[email protected]
Cc: Yishai Galatzermailto:[email protected]
Subject: Re: [Home] Nupkg compression is bad (#890)

I tested ZipArchive when I created this bug and didn't find much benefit. I also tried some other zip utilities with different compression quality and didn't see much benefit.

LZMA on the binaries showed some major wins ~2/3 the size of zip, much greater when files were very similar as would be the case for cross-compiled implementations. Also potentially using other compression tech for XML could provide better wins there. I didn't get any chance to try it but there is a new standard for XML compression http://www.w3.org/XML/EXI/https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.w3.org%2fXML%2fEXI%2f&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=s044VWmgMjAebqKQhe%2bvqliPsvvZdEVB5k2uCNjwTRM%3d.

Suppose the XML is represented as a EXI compressed format within the container and then the entire container has LZMA. I think that'd be a significant savings.

EXI might even be something to look at for the docs on disk, assuming we could get VS support.
/cc @davidfowlhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fdavidfowl&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=zBISnWIADWZwK74rXClgQE55lC5%2fo888bMUn%2bsCDTCQ%3d


Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fNuGet%2fHome%2fissues%2f890%23issuecomment-144872675&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=1T9PHGz94jRrEcj3DQIL1eCSwcCbJ%2bD9%2f2dBpgk5n%2fk%3d.

@ericstj
Copy link
Author

ericstj commented Oct 2, 2015

You could do it on the server. If the client tells you it supports the new format, give it to them. Otherwise give them the old format.

@csharpfritz
Copy link
Contributor

Not everyone is using Nuget.org as a server. I think this needs to be thought through further to have a migration strategy with compatibility for older clients considered.

On Oct 1, 2015, at 20:04, Eric StJohn [email protected] wrote:

You could do it on the server. If the client tells you it supports the new format, give it to them. Otherwise give them the old format.


Reply to this email directly or view it on GitHub.

@ericstj
Copy link
Author

ericstj commented Oct 2, 2015

@csharpfritz of course. I'm not suggesting anything breaking here. As I mentioned it can be something opt-in by the client, and optional from the server. Sort of how Accept-Encoding and Content-Endcoding work with HTTP: client tells server it can understand the new format. Server tells client what format its giving it.

@harikmenon harikmenon modified the milestones: Client-VNext, Future Apr 19, 2016
@dessant
Copy link

dessant commented May 21, 2016

+1 for LZMA. I've switched from NSIS, which has LZMA compression, to Squirrel.Windows (uses NuGet packages) and my installer has doubled in size.

Ideally there should be an option for choosing between a handful of compression methods. Faster decompression may also be favored over a smaller package size depending on the use case.

@ericlaw1979
Copy link

To be clear here, you can significantly improve compression while still using ZIP and remaining compatible with all legacy Zip (DEFLATE) clients. You can use e.g. Zopfli on the DEFLATE streams or even just use 7Zip to generate the ZIP, set to maximum compression.

@ericstj
Copy link
Author

ericstj commented Aug 17, 2016

you can significantly improve compression while still using ZIP

I guess that depends on your definition of significant. I only saw gains under 5% by tweaking DEFLATE. A couple problems are that the window size is too for deflate small and zip isn't cross-file. The only ways I saw significant gains were using cross-file compression with a significantly large compression window.

@xen2
Copy link

xen2 commented Feb 20, 2018

Bump: for Xenko, we reduced package from 260mb to 140mb by using 7z inside the package (automatically decompress on install). We reused Microsoft.DotNet.Archive to deduplicate files too.

However, since there is no init.ps1 anymore in new NuGet, we can't rely on that anymore...

@zhili1208 zhili1208 modified the milestones: Future-2, 4.7 Feb 22, 2018
@xen2
Copy link

xen2 commented Aug 30, 2018

@zhili1208 @rrelyea Definitive close or this might be reevaluted later?
Comparing 7z to zip, I am sure having packages twice smaller (and even more with duplicate files w/ ref assemblies) would benefit a lot of people due to faster download/deployment.

@lostmsu
Copy link

lostmsu commented Jan 23, 2020

Yeah, with the current Nuget.org limit I can't even serve the latest TensorFlow, because its largest binary after zip compression is still 260MB alone. Either increase the limit on Nuget.org, or let us use 7z.

@lostmsu
Copy link

lostmsu commented Jun 11, 2020

@rrelyea I am facing a combination of poor compression/NuGet.org package size limit issue with TensorFlow binaries (see the issue linked above).

Switching to ZIP+LZMA from ZIP+Deflate reduces the size of packed binaries from ~400MB to ~100MB. I am sure it would save NuGet.org a lot of traffic if adopted for larger packages.

@lostmsu
Copy link

lostmsu commented Sep 3, 2020

@rrelyea what exactly is the problem here with LZMA? It is not a breaking change to allow something, that was not allowed previously. The older clients won't be able to download new packages compressed with LZMA, but those are new packages. Previously existing packages will still work with older clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:DCR Design Change Request
Projects
None yet
Development

No branches or pull requests

10 participants