Adds zip support to the `zlib` module #45651

arcanis · 2022-11-28T00:37:50Z

This PR adds a new ZipArchive class to the zlib module, which can be used to read and write content from zip archives. Its current API looks like this:

const fs = require(`fs`);
const {ZipArchive} = require(`zlib`);

// Creates a new in-memory archive
const zip = new ZipArchive();
zip.addFile(`hello`, fs.readFileSync(__filename));
const data = zip.digest();

fs.writeFileSync(`./archive.zip`, data);

// The data obtained from `digest` can also be reopened
const zip2 = new ZipArchive(data);
console.log(zip2.getEntries());
console.log(zip2.getEntries({withFileTypes: true}));
const content = zip2.readEntry(0);

console.log(content);

Maintenance cost

I kept the feature scope limited enough to cover most of the use cases but without increasing the maintenance cost or build cost. A few things have been cut from what the libzip would allow:

Opening files directly from the filesystem isn't supported, because it would bypass node:fs. The current API only works with memory buffers (according to my tests it doesn't have any negative impact even when compared to the wasm API which went through file descriptors).
Encryption isn't supported, because it's unclear how it should integrate with node:crypto. There's room for follow-up, but it didn't seem a required feature for the first iteration.

Performances

Keep in mind that raw performances aren't the main reason why zip support is important to have as a native feature. The speedup is nice, the simplified garbage collection is very nice, but the real benefit is having a stable cross-platform way to bundle files between platforms. It will be useful for cache mechanisms, transfer algorithms, user CLI generation, and more.

Still, I made some reasonable checks to make sure that no use case regressed. Size of the binary before / after:

before 89604801 85.45MB
after  89780257 85.62MB (+171KB)

Performance-wise, using Yarn as benchmark, the results show native being ~2x faster than wasm (keep in mind the wasm implementation isn't the most popular zip library; projects using jszip will see significantly larger differences):

YARN_EXPERIMENT_NATIVE_ZIPFS=0 PKG=gatsby
➤ YN0000: └ Completed in 15s 806ms
YARN_EXPERIMENT_NATIVE_ZIPFS=1 PKG=gatsby
➤ YN0000: └ Completed in 7s 404ms

YARN_EXPERIMENT_NATIVE_ZIPFS=0 PKG=typescript
➤ YN0000: └ Completed in 13s 676ms
YARN_EXPERIMENT_NATIVE_ZIPFS=1 PKG=typescript
➤ YN0000: └ Completed in 5s 351ms

YARN_EXPERIMENT_NATIVE_ZIPFS=0 PKG=next
➤ YN0000: └ Completed in 5s 923ms
YARN_EXPERIMENT_NATIVE_ZIPFS=1 PKG=next
➤ YN0000: └ Completed in 3s 512ms

To Do

Improve the documentation
Add more regression tests
Benchmark against the WASM libzip
API Bikeshedding

mscdex

-1 I know this is an unpopular opinion, but I'm not convinced this should live in node core. Sure it's a common file format, but then again so are tar, rar, 7z, zstd, xz, and others, which also don't belong in node core.

I feel like adding such modules to node core is further leading to feature creep. I get that other platforms like PHP and such may have zip modules, but they also include a ton of other modules that make them "kitchen sink" platforms, which I would hate to see node.js become.

GeoffreyBooth · 2022-11-28T01:30:04Z

@mscdex Could the discussion of whether we should do this stay in #45434? And this PR discussion can focus on the implementation.

addaleax

I'd appreciate some early review to let me know places where the code isn't compatible with Node.js' standards.

Might be jumping the gun here since yeah, there should be a discussion about whether this should happen at all, but, sure, gave it a first look. I do concur with @mscdex's concerns, fwiw.

src/node_zip.cc

addaleax · 2022-11-28T01:10:42Z

src/node_zip.cc

+}
+
+void ZipArchive::MemoryInfo(MemoryTracker* tracker) const {
+}


(might want to fill this out)

How precise does it have to be? I updated the code to track the size of the input buffer + the buffer of any file that gets added later, but it doesn't include the small-ish libzip overhead, and gets confused if the same file is modified multiple times. If the value is indicative it might be fine?

It doesn’t have to be super precise, but it should be usable for debugging. You’ll probably want to track buf_ here via tracker->TrackField("buf", buf_);. If you can’t track or estimate memory owned by libzip (including memory for added entries?) then it’s probably fine to omit it, rather than to give numbers that are potentially very inaccurate (e.g. after repeated AddEntry() + DeleteEntry() calls).

src/node_zip.cc

addaleax · 2022-11-28T01:24:14Z

src/node_zip.cc

+  }
+
+  zip_int64_t file_index = zip_file_add(zip->zip_, *path, file_source, ZIP_FL_OVERWRITE | ZIP_FL_ENC_UTF_8);
+  CHECK_GE(file_index, 0);


What if this call fails? Likely also applies elsewhere.

For now the code is overly strict and if any function fails, the program aborts on the CHECK_GE call. I have to replace most these calls by something that would just throw instead.

src/node_zip.cc

lib/zip.js

anonrig · 2022-11-28T13:41:24Z

Despite the comments from other reviewers, my main concern is about the usage of the buffer module. I strongly believe that the public API should consume necessary native buffers (TypedArray) instead of Node.js buffers.

arcanis · 2022-11-28T13:49:47Z

I used Buffer since that's what the other main Node APIs tended to use (fs, crypto, zlib) - wouldn't it be surprising for users to return a regular typed array in just this API?

anonrig · 2022-11-28T13:52:18Z

I used Buffer since that's what the other main Node APIs tended to use (fs, crypto, zlib) - wouldn't it be surprising for users to return a regular typed array in just this API?

There is an initiative to use native buffers in the new public APIs (referencing my personal talks with @addaleax and @jasnell), a @nodejs/tsc member can clarify if this is still a thing.

addaleax · 2022-11-28T14:31:16Z

@anonrig I don’t know if that’s the best way forward here, but since this feels like a very broad conversation (“Should new Node.js APIs return Uint8Array or should they return Buffer?”), maybe it’s also best to handle that separately from this specific PR?

jasnell · 2022-11-29T14:13:17Z

In this case, I think Buffer is fine given that it is consistent with the rest of the zlib module. I can see us eventually making a call on avoiding Buffer in the future (or standardizing on it) but this is not the place to decide that

jasnell · 2022-11-29T14:18:04Z

Is a new top level module what we want here? As opposed to adding this to zlib? I know it's not based on zlib but neither is brotli.

arcanis · 2022-11-29T14:40:34Z

I think @GeoffreyBooth had the same feedback. I don't have a strong opinion there, perhaps zlib would indeed better match user expectations.

GeoffreyBooth · 2022-11-29T14:46:50Z

I would put it in zlib. In the future we could consider a friendlier name as an alias for zlib, like node:compression or something, but that's for later.

tniessen · 2022-12-05T09:41:19Z

There is an initiative to use native buffers in the new public APIs

Just leaving this reference here: #41588

doc/api/zip.md

tniessen · 2022-12-05T09:43:44Z

In the future we could consider a friendlier name as an alias for zlib, like node:compression or something, but that's for later.

That might make some sense for zip, which inherently supports compression, but if we do add other archive formats that don't, it won't fit. Also, except for zip, compression and archive formats are orthogonal even if related topics.

GeoffreyBooth · 2023-01-06T21:53:22Z

doc/api/zlib.md

+## Compressing multiple files together
+
+<!-- YAML
+added: REPLACEME
+-->
+
+The `zlib` library provides ways to compress individual objects, but not to
+aggregate multiple ones into a single file suitable for redistribution (what
+is often called archival).
+
+To this end, `node:zip` provides the `ZipArchive` class which allows to create,
+read, and modify zip archives:


I assume this section needs updating? Because of references to node:zip etc.

Indeed, it's a typo, the archive is now part of node:zlib (would it make sense to have a check in lint-md that all node:something identifiers must be valid?)

nodejs-github-bot added build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. meta Issues and PRs related to the general management of the project. needs-ci PRs that need a full CI run. labels Nov 28, 2022

arcanis mentioned this pull request Nov 28, 2022

Native zip support #45434

Open

mscdex suggested changes Nov 28, 2022

View reviewed changes

addaleax reviewed Nov 28, 2022

View reviewed changes

aduh95 reviewed Nov 28, 2022

View reviewed changes

lib/zip.js Outdated Show resolved Hide resolved

lib/zip.js Outdated Show resolved Hide resolved

lib/zip.js Outdated Show resolved Hide resolved

lib/zip.js Outdated Show resolved Hide resolved

aduh95 reviewed Nov 28, 2022

View reviewed changes

lib/zip.js Outdated Show resolved Hide resolved

aduh95 reviewed Nov 28, 2022

View reviewed changes

lib/zip.js Outdated Show resolved Hide resolved

mcollina reviewed Dec 5, 2022

View reviewed changes

doc/api/zip.md Outdated Show resolved Hide resolved

arcanis changed the title ~~Adds prototype zip module~~ Adds prototype archive module Dec 18, 2022

arcanis mentioned this pull request Dec 18, 2022

Experimental: Implements a ZipFS on top of the native WIP PR yarnpkg/berry#5145

Open

3 tasks

arcanis force-pushed the mael/zip branch from e75c0b5 to 396157c Compare January 6, 2023 10:23

arcanis changed the title ~~Adds prototype archive module~~ Adds zip support to the zlib module Jan 6, 2023

zlib: add support for zip archives

66cd39f

arcanis force-pushed the mael/zip branch from 396157c to 66cd39f Compare January 6, 2023 12:22

Adds a section to the documentation

5cb254d

arcanis marked this pull request as ready for review January 6, 2023 21:30

GeoffreyBooth reviewed Jan 6, 2023

View reviewed changes

arcanis mentioned this pull request Mar 5, 2023

Native support for PnP microsoft/TypeScript#35206

Open

4 tasks

belgattitude mentioned this pull request May 15, 2023

Ci: improve install time iteration 2 strapi/strapi#16638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds zip support to the `zlib` module #45651

Adds zip support to the `zlib` module #45651

arcanis commented Nov 28, 2022 •

edited

Loading

mscdex left a comment

GeoffreyBooth commented Nov 28, 2022

addaleax left a comment

addaleax Nov 28, 2022

arcanis Nov 28, 2022

addaleax Nov 28, 2022

addaleax Nov 28, 2022

arcanis Nov 28, 2022

anonrig commented Nov 28, 2022

arcanis commented Nov 28, 2022

anonrig commented Nov 28, 2022

addaleax commented Nov 28, 2022

jasnell commented Nov 29, 2022

jasnell commented Nov 29, 2022

arcanis commented Nov 29, 2022

GeoffreyBooth commented Nov 29, 2022

tniessen commented Dec 5, 2022

tniessen commented Dec 5, 2022

GeoffreyBooth Jan 6, 2023

arcanis Jan 6, 2023

Adds zip support to the zlib module #45651

Are you sure you want to change the base?

Adds zip support to the zlib module #45651

Conversation

arcanis commented Nov 28, 2022 • edited Loading

Maintenance cost

Performances

To Do

mscdex left a comment

Choose a reason for hiding this comment

GeoffreyBooth commented Nov 28, 2022

addaleax left a comment

Choose a reason for hiding this comment

addaleax Nov 28, 2022

Choose a reason for hiding this comment

arcanis Nov 28, 2022

Choose a reason for hiding this comment

addaleax Nov 28, 2022

Choose a reason for hiding this comment

addaleax Nov 28, 2022

Choose a reason for hiding this comment

arcanis Nov 28, 2022

Choose a reason for hiding this comment

anonrig commented Nov 28, 2022

arcanis commented Nov 28, 2022

anonrig commented Nov 28, 2022

addaleax commented Nov 28, 2022

jasnell commented Nov 29, 2022

jasnell commented Nov 29, 2022

arcanis commented Nov 29, 2022

GeoffreyBooth commented Nov 29, 2022

tniessen commented Dec 5, 2022

tniessen commented Dec 5, 2022

GeoffreyBooth Jan 6, 2023

Choose a reason for hiding this comment

arcanis Jan 6, 2023

Choose a reason for hiding this comment

Adds zip support to the `zlib` module #45651

Adds zip support to the `zlib` module #45651

arcanis commented Nov 28, 2022 •

edited

Loading