Skip to content

Commit

Permalink
perf: Speed up non-zip rejection by limiting search for EOCDR.
Browse files Browse the repository at this point in the history
I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    https://github.com/hiirotsuki/infozip/blob/bb0c4755d44f21bda0744a5e1868d25055a543cc/zipfile.c#L4073
  • Loading branch information
caldwell committed Oct 23, 2024
1 parent ea57909 commit ab810da
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion src/spec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,14 @@ impl Zip32CentralDirectoryEnd {
return Err(ZipError::InvalidArchive("Invalid zip header"));
}

let search_lower_bound = 0;
// The End Of Central Directory Record should be the last thing in
// the file and so searching the last 65557 bytes of the file should
// be enough. However, not all zips are well-formed and other
// programs may consume zips with extra junk at the end without
// error, so we go back 128K to be compatible with them. 128K is
// arbitrary, but it matches what Info-Zip does.
const EOCDR_SEARCH_SIZE: u64 = 128 * 1024;
let search_lower_bound = file_length.saturating_sub(EOCDR_SEARCH_SIZE);

const END_WINDOW_SIZE: usize = 8192;
/* TODO: use static_assertions!() */
Expand Down

0 comments on commit ab810da

Please sign in to comment.