Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seek bound on W in ZipWriter<W> seems overly restrictive? #32

Open
Fuuzetsu opened this issue Apr 23, 2024 · 7 comments
Open

Seek bound on W in ZipWriter<W> seems overly restrictive? #32

Fuuzetsu opened this issue Apr 23, 2024 · 7 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Fuuzetsu
Copy link

It appears that the W in ZipWriter<W> must always be Write + Seek. This means that for example, one can't write the zip to stdout.

It seems that Seek should only be required on the methods that actually make use of the functionality. As far as I understand, it should be perfectly possible to not have Seek if we're not doing anything exotic. In my use-case, I just want to add a set of files sequentially and then I'm done.

A poor work-around is to use a temporary file or a buffer and write to that, but that obviously wastes space/memory and makes the process non-streaming.

I'm not too privy to zip internals so correct me if I'm wrong. I'm pretty sure that zip program on Linux can write to stdout just by passing - as an argument. You can then send the data over the network directly without an intermediate, so presumably it's possible for the simple case.

@Pr0methean
Copy link
Member

Not all decompression software will correctly handle a ZIP-file entry whose size, compressed size and checksum aren't known at the time the entry's local-file header is transmitted. Therefore, supporting stream output when it isn't at least possible to buffer and flush the individual compressed files will mean a separate ZipWriter type and so on. I'll try to implement this, but it'll probably take a while.

@Fuuzetsu
Copy link
Author

Not all decompression software will correctly handle a ZIP-file entry whose size, compressed size and checksum aren't known at the time the entry's local-file header is transmitted.

Just to clarify, I think you're saying that we need to know things like compressed size up front, which means compressing the file to somewhere. Presumably what happens at the moment is space is left for the header, then compression happens and is appended and then it seeks back to the header and fills in the information it now knows: sizes, checksum etc. Right?

I wonder what tools like zip on Linux do…

Thanks for looking into this.

@Pr0methean
Copy link
Member

Pr0methean commented Apr 23, 2024

What zip does on Linux if the output isn't seekable is to set the "data descriptor" flag, which indicates that the CRC32 checksum, size and compressed size will be in a footer after the file.

Per https://github.com/Pr0methean/zip/blob/ffa7772cc362049308193e133744a0120092f0c2/src/read.rs#L1239, the read_zipfile_from_stream method is one example of software that doesn't support files that use the data descriptor. Per https://github.com/Pr0methean/zip/blob/ffa7772cc362049308193e133744a0120092f0c2/src/write.rs#L691, ZipWriter doesn't currently use data descriptors.

@Pr0methean Pr0methean added the help wanted Extra attention is needed label Apr 24, 2024
@Pr0methean
Copy link
Member

Pr0methean commented Apr 24, 2024

I think the only solution will be to give ZipWriter a new type parameter S: Seek, a new field seeker: Option<S>, and a second constructor for when W: S. S would be Infallible if the output didn't implement Seek. (If using Infallible somehow doesn't work, we can create a custom impossible type, or wait for rust-lang/rust#35121 to be stabilized and then use !.)

@Pr0methean Pr0methean added the enhancement New feature or request label Apr 26, 2024
@chenxiaolong
Copy link

In case it's useful at all, an implementation of this was submitted for zip-old last year here: zip-rs/zip-old#383. It uses the same ZipWriter type, but wraps the inner writer with a new MaybeSeekable type.

@ofek
Copy link
Contributor

ofek commented May 12, 2024

I don't quite understand the current discussion but is this comment still accurate? https://github.com/ofek/pyapp/blob/fc356202f49cb79f5e92294e4ed29d33b50442b0/src/distribution.rs#L217-L225

Essentially, there is an option to embed an archive inside the binary itself but I must save to a temporary file in order to extract. Is it possible to directly extract now given a stream?

@Pr0methean
Copy link
Member

Pr0methean commented May 12, 2024

Extracting without Seek has been possible using the read_zipfile_from_stream method (https://github.com/zip-rs/zip2/blob/master/src/read.rs#L1298) since version 0.4.1 (released 2018-06-20): 38d1699 It still has a few limitations, but should work for almost any unencrypted ZIP file that this crate can produce. Consider forking and finishing #70 if you need to read a ZIP file in streaming mode while it's being written in streaming mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants