-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nupkg compression is bad #890
Comments
What tools are you using to pack? |
nuget.exe pack on a nuspec |
NuGet is currently using the packaging APIs rather than ZipArchive. There is a possible potential for improvements, but this doesn't seem to be high on the immediate to do list |
I tested ZipArchive when I created this bug and didn't find much benefit. I also tried some other zip utilities with different compression quality and didn't see much benefit. LZMA on the binaries showed some major wins ~2/3 the size of zip, much greater when files were very similar as would be the case for cross-compiled implementations. Also potentially using other compression tech for XML could provide better wins there. I didn't get any chance to try it but there is a new standard for XML compression http://www.w3.org/XML/EXI/. Suppose the XML is represented as a EXI compressed format within the container and then the entire container has LZMA. I think that'd be a significant savings. EXI might even be something to look at for the docs on disk, assuming we could get VS support. |
The down side of fancier compressions means breaking compat with older clients. Sent from my Windows Phone From: Eric StJohnmailto:[email protected] I tested ZipArchive when I created this bug and didn't find much benefit. I also tried some other zip utilities with different compression quality and didn't see much benefit. LZMA on the binaries showed some major wins ~2/3 the size of zip, much greater when files were very similar as would be the case for cross-compiled implementations. Also potentially using other compression tech for XML could provide better wins there. I didn't get any chance to try it but there is a new standard for XML compression http://www.w3.org/XML/EXI/https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.w3.org%2fXML%2fEXI%2f&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=s044VWmgMjAebqKQhe%2bvqliPsvvZdEVB5k2uCNjwTRM%3d. Suppose the XML is represented as a EXI compressed format within the container and then the entire container has LZMA. I think that'd be a significant savings. EXI might even be something to look at for the docs on disk, assuming we could get VS support. — |
You could do it on the server. If the client tells you it supports the new format, give it to them. Otherwise give them the old format. |
Not everyone is using Nuget.org as a server. I think this needs to be thought through further to have a migration strategy with compatibility for older clients considered.
|
@csharpfritz of course. I'm not suggesting anything breaking here. As I mentioned it can be something opt-in by the client, and optional from the server. Sort of how Accept-Encoding and Content-Endcoding work with HTTP: client tells server it can understand the new format. Server tells client what format its giving it. |
+1 for LZMA. I've switched from NSIS, which has LZMA compression, to Squirrel.Windows (uses NuGet packages) and my installer has doubled in size. Ideally there should be an option for choosing between a handful of compression methods. Faster decompression may also be favored over a smaller package size depending on the use case. |
To be clear here, you can significantly improve compression while still using ZIP and remaining compatible with all legacy Zip (DEFLATE) clients. You can use e.g. Zopfli on the DEFLATE streams or even just use 7Zip to generate the ZIP, set to maximum compression. |
I guess that depends on your definition of significant. I only saw gains under 5% by tweaking DEFLATE. A couple problems are that the window size is too for deflate small and zip isn't cross-file. The only ways I saw significant gains were using cross-file compression with a significantly large compression window. |
Bump: for Xenko, we reduced package from 260mb to 140mb by using 7z inside the package (automatically decompress on install). We reused Microsoft.DotNet.Archive to deduplicate files too. However, since there is no init.ps1 anymore in new NuGet, we can't rely on that anymore... |
@zhili1208 @rrelyea Definitive close or this might be reevaluted later? |
Yeah, with the current Nuget.org limit I can't even serve the latest TensorFlow, because its largest binary after zip compression is still 260MB alone. Either increase the limit on Nuget.org, or let us use 7z. |
@rrelyea I am facing a combination of poor compression/NuGet.org package size limit issue with TensorFlow binaries (see the issue linked above). Switching to ZIP+LZMA from ZIP+Deflate reduces the size of packed binaries from ~400MB to ~100MB. I am sure it would save NuGet.org a lot of traffic if adopted for larger packages. |
@rrelyea what exactly is the problem here with LZMA? It is not a breaking change to allow something, that was not allowed previously. The older clients won't be able to download new packages compressed with LZMA, but those are new packages. Previously existing packages will still work with older clients. |
Currently nupkg compression doesn't seem to de-dup multiple files. It also does a poor job at compressing XML. We doubled the size of our packages when we added localized docs. I haven't done a ton of investigation here but I bet there are a bunch of things nuget could do better.
The text was updated successfully, but these errors were encountered: