-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filesize mismatch when decompressing multi-stream with sizes greater than 2GB (2^31) #61
Comments
Thanks for the report! Is there some code to peek at as well? That'd help narrow down where the issue is in this crate. |
Hi @alexcrichton -- yes sorry the code is in use std::fs::File;
use bzip2::read::{BzDecoder, MultiBzDecoder};
use std::io::{Read, Write};
fn main() -> Result<(), std::io::Error> {
// decompress them with a bzip multi stream and compare the results (as well to the expected output)
println!("Generating expected results");
let mut expected = Vec::with_capacity(3_000_000_100);
let data = "0123456789";
for _ in 0..300_000_000 {
expected.write(data.as_bytes())?;
}
let expected_len = 3_000_000_000;
assert_eq!(expected.len(), expected_len);
// This passes
println!("Decompressing stream made with bzip2");
let mut decompressor = BzDecoder::new(File::open("raw.dat.bz2").expect("raw.dat.bz2 not found"));
let mut contents = Vec::with_capacity(3_000_000_100);
let num_read = decompressor.read_to_end(&mut contents).expect("error decompressing bz2 data");
assert_eq!(num_read, expected_len, "decompressed length mismatch");
assert_eq!(contents, expected, "data mismatch");
// This fails with:
// ...
// Decompressing stream made with pbzip2
// thread 'main' panicked at 'assertion failed: `(left == right)`
// left: `405900000`,
// right: `3000000000`: decompressed length mismatch', src/main.rs:30:5
//note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
println!("Decompressing stream made with pbzip2");
let mut decompressor = MultiBzDecoder::new(File::open("raw.dat.pbz2").expect("raw.dat.pbz2 not found"));
let mut contents = Vec::with_capacity(3_000_000_100);
let num_read = decompressor.read_to_end(&mut contents).expect("error decompressing pdbz2 data");
assert_eq!(num_read, expected_len, "decompressed length mismatch");
assert_eq!(contents, expected, "data mismatch");
println!("Done");
Ok(())
} |
Oh gah sorry I missed the link from before, my bad! |
OK sorry I don't have a ton of time to look into this right now, but it may be a relatively easy bug to fix in |
No worries @alexcrichton -- if I get a chance I will look into it too. I figured the reproducer was the most important thing to prepare so I wanted to get that posted. |
I got hit by a similar bug today and this was a bit tricky to analyze. In short, this will signal EOF even for multistreams, if the input buffer lined up so that it starts with the bzip2 end-of-stream marker (as BZ2_bzDecompress will return Line 219 in 0574528
|
when the input buffer has an end-of-stream marker right at the beginning, decompress() will return StreamEnd and total_in will not advance. We cannot return Ok(read) as this would signal EOF. Instead, we rely on the next loop iteration to really return EOF when the input buffer did not fill again.
Here is a self contained reproducer of the problem: bzip_bug.zip
To reproduce:
You will see the following output:
The expected output is that the program complete's sucessfully
The test has two files:
bzip2
tool on a macpbzip2
tool on a macThe data was generated using the generate.rs tool in the package, via the following commands:
You can check the output byte counts using
bzcat
:The issue seems to affect files that are larger than 2^31 (which smells like a u32 overflow somewhere)
The text was updated successfully, but these errors were encountered: