Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get MIRI running against parquet crate #614

Open
alamb opened this issue Jul 25, 2021 · 7 comments
Open

Get MIRI running against parquet crate #614

alamb opened this issue Jul 25, 2021 · 7 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Jul 25, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The MIRI tool helps crates ensure they have no undefined behavior https://github.com/rust-lang/miri

To improve the stability / security of the parquet crate, it would be cool to run MIRI against that too.

Describe the solution you'd like

Ideally, we would run same kind of MIRI checks on the parquet crate as we do now on arrow: https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60

This means a command something like

cargo +nightly miri  test  -p parquet

Sadly, when I run this locally the very first test fails with an alignment issue

(arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri  test  -p parquet
WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not support wrapping.
   Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow)
   Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet)
    Finished test [unoptimized + debuginfo] target(s) in 4.53s
     Running unittests (target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c)

running 456 tests
test arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
    --> parquet/src/util/bit_packing.rs:150:12
     |
150  |     *out = (*in_buf) % (1u32 << 2);
     |            ^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
             
     = note: inside `util::bit_packing::unpack2_32` at parquet/src/util/bit_packing.rs:150:12
note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:37:14
    --> parquet/src/util/bit_packing.rs:37:14
     |
37   |         2 => unpack2_32(in_ptr, out_ptr),
     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `util::bit_util::BitReader::get_batch::<i16>` at parquet/src/util/bit_util.rs:568:30
    --> parquet/src/util/bit_util.rs:568:30
     |
568  |                     in_ptr = unpack32(in_ptr, out_ptr, num_bits);
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at parquet/src/encodings/rle.rs:415:30
    --> parquet/src/encodings/rle.rs:415:30
     |
415  |                   num_values = bit_reader.get_batch::<T>(
     |  ______________________________^
416  | |                     &mut buffer[values_read..values_read + num_values],
417  | |                     self.bit_width as usize,
418  | |                 );
     | |_________________^
note: inside `encodings::levels::LevelDecoder::get` at parquet/src/encodings/levels.rs:262:35
    --> parquet/src/encodings/levels.rs:262:35
     |
262  |                 let values_read = decoder.get_batch::<i16>(&mut buffer[0..len])?;
     |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` at parquet/src/column/reader.rs:459:9
    --> parquet/src/column/reader.rs:459:9
     |
459  |         level_decoder.get(buffer)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at parquet/src/column/reader.rs:210:38
    --> parquet/src/column/reader.rs:210:38
     |
210  |                       num_def_levels = self.read_def_levels(
     |  ______________________________________^
211  | |                         &mut levels[levels_read..levels_read + iter_batch_size],
212  | |                     )?;
     | |_____________________^
note: inside `<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>, arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> as arrow::array_reader::ArrayReader>::next_batch` at parquet/src/arrow/array_reader.rs:490:17
    --> parquet/src/arrow/array_reader.rs:490:17
     |
490  | /                 self.column_reader.as_mut().unwrap().read_batch(
491  | |                     num_to_read,
492  | |                     cur_def_levels_buf,
493  | |                     cur_rep_levels_buf,
494  | |                     cur_data_buf,
495  | |                 )?;
     | |_________________^
note: inside `arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at parquet/src/arrow/array_reader.rs:2229:21
    --> parquet/src/arrow/array_reader.rs:2229:21
     |
2229 |         let array = array_reader.next_batch(values_per_page / 2).unwrap();
     |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at parquet/src/arrow/array_reader.rs:2154:5
    --> parquet/src/arrow/array_reader.rs:2154:5
     |
2153 |       #[test]
     |       ------- in this procedural macro expansion
2154 | /     fn test_complex_array_reader_def_and_rep_levels() {
2155 | |         // Construct column schema
2156 | |         let message_type = "
2157 | |         message test_schema {
...    |
2276 | |         );
2277 | |     }
     | |_____^
     = note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 2277:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30
     = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9
     = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9
     = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
     = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
     = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
     = note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39
     = note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13
     = note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28
     = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17
     = note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
     = note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15
     = note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5
     = note: inside `main`
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
     = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
     = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
     = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
     = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
     = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
     = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
     = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
     = note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
     = note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

Describe alternatives you've considered
The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from @jorgecarleitao apparently already has parquet running under MIR this already running, so depending on if that gets integrated into the main arrow-rs repo, that might change how we go about getting a clean run

Additional context
Thanks to help from @roee88 , @jhorstmann and others who I probably missed, we now have MIRI checks validated for the arrow crate on each PR: https://github.com/apache/arrow-rs/runs/3154551440

Which runs a command such as:

  cargo miri test -p arrow -- --skip csv --skip ipc --skip json
@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jul 25, 2021
@alamb
Copy link
Contributor Author

alamb commented Jul 25, 2021

As noted by @houqp -- I originally filed this in the wrong repo by mistake: apache/datafusion#773

@alamb
Copy link
Contributor Author

alamb commented Jul 25, 2021

🤔 though I tried to get a clean MIRI run against the parquet2 crate

(arrow_dev) alamb@MacBook-Pro:~/Software/parquet2$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -- --skip io

And I got some errors; I am probably running it incorrectly 😢

test read::levels::tests::test_get_bit_width ... ok
test read::metadata::tests::test_basics ... error: unsupported operation: can't call foreign function: strerror_r
   --> /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
    |
122 |         if strerror_r(errno as c_int, p, buf.len()) < 0 {
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function: strerror_r
    |
    = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
            
    = note: inside `std::sys::unix::os::error_string` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
    = note: inside `<std::io::error::Repr as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:699:36
    = note: inside `<std::io::Error as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:65:9
    = note: inside `<&dyn std::fmt::Debug as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2036:62
    = note: inside `std::fmt::write` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1115:17
    = note: inside `<std::string::String as std::fmt::Write>::write_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:187:9
    = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:481:22
    = note: inside `std::option::Option::<std::string::String>::get_or_insert_with::<[closure@std::panicking::begin_panic_handler::PanicPayload::fill::{closure#0}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:1270:26
    = note: inside `std::panicking::begin_panic_handler::PanicPayload::fill` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:479:13
    = note: inside `<std::panicking::begin_panic_handler::PanicPayload as core::panic::BoxMeUp>::get` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:497:13
    = note: inside `std::panicking::rust_panic_with_hook` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:621:34
    = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:519:13
    = note: inside `std::sys_common::backtrace::__rust_end_short_backtrace::<[closure@std::panicking::begin_panic_handler::{closure#0}], !>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:141:18
    = note: inside `core::panicking::panic_fmt::panic_impl` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:515:5
    = note: inside `core::panicking::panic_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:92:14
    = note: inside `std::result::unwrap_failed` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic.rs:24:9
    = note: inside `std::result::Result::<std::fs::File, std::io::Error>::unwrap` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1281:23
note: inside `read::metadata::tests::test_basics` at src/read/metadata.rs:188:24

@jhorstmann
Copy link
Contributor

Not really familiar with that code, but there is a FIXME comment about the alignment after pointer casts here:

// FIXME assert!(memory::is_ptr_aligned(in_ptr));

If the alignment can't be guaranteed to be valid for u32, one possible solution could be to replace all *in_buf in bit_packing.rs with std::ptr::read_unaligned(in_buf). On x86 that would not have any performance impact.

@Jefffrey
Copy link
Contributor

Jefffrey commented Dec 2, 2023

Small update, I tried running MIRI:

arrow-rs$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri  test  -p parquet

Got error:

test arrow::array_reader::list_array::tests::test_nested_lists ... error: unsupported operation: non-default mode 0o600 is not supported
    --> /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/fs.rs:1133:36
     |
1133 |         let fd = cvt_r(|| unsafe { open64(path.as_ptr(), flags, opts.mode as c_int) })?;
     |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non-default mode 0o600 is not supported
     |
     = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
     = note: BACKTRACE:
     = note: inside closure at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/fs.rs:1133:36: 1133:84
     = note: inside `std::sys::unix::cvt_r::<i32, {closure@std::sys::unix::fs::File::open_c::{closure#0}}>` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/mod.rs:328:19: 328:22
     = note: inside `std::sys::unix::fs::File::open_c` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/fs.rs:1133:18: 1133:87
     = note: inside closure at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/fs.rs:1121:41: 1121:65
     = note: inside `std::sys::common::small_c_string::run_with_cstr::<std::sys::unix::fs::File, {closure@std::sys::unix::fs::File::open::{closure#0}}>` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/common/small_c_string.rs:43:18: 43:22
     = note: inside `std::sys::common::small_c_string::run_path_with_cstr::<std::sys::unix::fs::File, {closure@std::sys::unix::fs::File::open::{closure#0}}>` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/common/small_c_string.rs:22:5: 22:58
     = note: inside `std::sys::unix::fs::File::open` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/fs.rs:1121:9: 1121:66
     = note: inside `std::fs::OpenOptions::_open` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/fs.rs:1128:9: 1128:42
     = note: inside `std::fs::OpenOptions::open::<&std::path::Path>` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/fs.rs:1124:9: 1124:34
     = note: inside `tempfile::file::imp::unix::create_named` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/imp/unix.rs:30:5: 30:28
     = note: inside `tempfile::file::imp::unix::create_unlinked` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/imp/unix.rs:43:13: 43:56
     = note: inside closure at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/imp/unix.rs:80:16: 80:38
     = note: inside `tempfile::util::create_helper::<std::fs::File, {closure@tempfile::file::imp::unix::create_unix::{closure#0}}>` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/util.rs:33:22: 33:29
     = note: inside `tempfile::file::imp::unix::create_unix` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/imp/unix.rs:75:5: 81:6
     = note: inside closure at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/imp/unix.rs:62:21: 62:37
     = note: inside `std::result::Result::<std::fs::File, std::io::Error>::or_else::<std::io::Error, {closure@tempfile::file::imp::unix::create::{closure#0}}>` at /home/jeffrey/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1378:23: 1378:28
     = note: inside `tempfile::file::imp::unix::create` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/imp/unix.rs:53:5: 66:11
     = note: inside `tempfile::tempfile_in::<std::path::PathBuf>` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/mod.rs:102:5: 102:30
     = note: inside `tempfile::tempfile` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tempfile-3.8.0/src/file/mod.rs:61:5: 61:33
note: inside `arrow::array_reader::list_array::tests::test_nested_lists`
    --> parquet/src/arrow/array_reader/list_array.rs:550:20
     |
550  |         let file = tempfile::tempfile().unwrap();
     |                    ^^^^^^^^^^^^^^^^^^^^
note: inside closure
    --> parquet/src/arrow/array_reader/list_array.rs:524:27
     |
523  |     #[test]
     |     ------- in this procedural macro expansion
524  |     fn test_nested_lists() {
     |                           ^
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 3 warnings emitted

Seems like MIRI doesn't support tempfile, see rust-lang/miri#2720

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 6, 2024

Miri now supports tempfile on latest nightly.

Running command:

MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -p parquet -- -Z unstable-options --report-time
  • Also timing execution of tests

Certain tests took an extremely long time to execute:

test arrow::arrow_reader::tests::test_fixed_length_binary_column_reader ... ok <3222.593s>
test arrow::arrow_reader::tests::test_int96_single_column_reader_test ... ok <4466.186s>
test arrow::arrow_reader::tests::test_interval_day_time_column_reader ... ok <6590.588s>
test arrow::arrow_reader::tests::test_list_selection ... ok <486.963s>
test arrow::arrow_reader::tests::test_list_selection_fuzz ... ok <11456.272s>
test arrow::arrow_reader::tests::test_primitive_single_column_reader_test ... ok <41011.028s>
test arrow::arrow_reader::tests::test_read_lz4_hadoop_large ... ok <1612.830s>

And finally failed at test arrow::arrow_reader::tests::test_read_structs with error message:

test arrow::arrow_reader::tests::test_read_structs ... error: unsupported operation: can't call foreign function `ZSTD_DStreamInSize` on OS `linux`
    --> /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zstd-safe-7.0.0/src/lib.rs:1146:18
     |
1146 |         unsafe { zstd_sys::ZSTD_DStreamInSize() }
     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `ZSTD_DStreamInSize` on OS `linux`
     |
     = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
     = note: BACKTRACE:
     = note: inside `zstd::zstd_safe::DCtx::<'_>::in_size` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zstd-safe-7.0.0/src/lib.rs:1146:18: 1146:48
     = note: inside `zstd::Decoder::<'static, std::io::BufReader<&[u8]>>::new` at /home/jeffrey/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zstd-0.13.0/src/stream/read/mod.rs:27:27: 27:53
note: inside `<compression::zstd_codec::ZSTDCodec as compression::Codec>::decompress`
    --> parquet/src/compression.rs:473:31
     |
473  |             let mut decoder = zstd::Decoder::new(input_buf)?;
     |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `file::serialized_reader::decode_page`
    --> parquet/src/file/serialized_reader.rs:417:13
     |
417  | /             decompressor.decompress(
418  | |                 compressed,
419  | |                 &mut decompressed,
420  | |                 Some(uncompressed_size - offset),
421  | |             )?;
     | |_____________^
note: inside `<file::serialized_reader::SerializedPageReader<std::fs::File> as column::page::PageReader>::get_next_page`
    --> parquet/src/file/serialized_reader.rs:628:21
     |
628  | /                     decode_page(
629  | |                         header,
630  | |                         Bytes::from(buffer),
631  | |                         self.physical_type,
632  | |                         self.decompressor.as_mut(),
633  | |                     )?
     | |_____________________^
note: inside `column::reader::GenericColumnReader::<column::reader::decoder::RepetitionLevelDecoderImpl, arrow::record_reader::definition_levels::DefinitionLevelBufferDecoder, column::reader::decoder::ColumnValueDecoderImpl<data_type::Int64Type>>::read_new_page`
    --> parquet/src/column/reader.rs:415:19
     |
415  |             match self.page_reader.get_next_page()? {
     |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::GenericColumnReader::<column::reader::decoder::RepetitionLevelDecoderImpl, arrow::record_reader::definition_levels::DefinitionLevelBufferDecoder, column::reader::decoder::ColumnValueDecoderImpl<data_type::Int64Type>>::has_next`
    --> parquet/src/column/reader.rs:557:17
     |
557  |             if !self.read_new_page()? {
     |                 ^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::GenericColumnReader::<column::reader::decoder::RepetitionLevelDecoderImpl, arrow::record_reader::definition_levels::DefinitionLevelBufferDecoder, column::reader::decoder::ColumnValueDecoderImpl<data_type::Int64Type>>::read_records`
    --> parquet/src/column/reader.rs:230:51
     |
230  |         while total_records_read < max_records && self.has_next()? {
     |                                                   ^^^^^^^^^^^^^^^
note: inside `arrow::record_reader::GenericRecordReader::<std::vec::Vec<i64>, column::reader::decoder::ColumnValueDecoderImpl<data_type::Int64Type>>::read_one_batch`
    --> parquet/src/arrow/record_reader/mod.rs:199:13
     |
199  | /             self.column_reader.as_mut().unwrap().read_records(
200  | |                 batch_size,
201  | |                 self.def_levels.as_mut(),
202  | |                 self.rep_levels.as_mut(),
203  | |                 &mut self.values,
204  | |             )?;
     | |_____________^
note: inside `arrow::record_reader::GenericRecordReader::<std::vec::Vec<i64>, column::reader::decoder::ColumnValueDecoderImpl<data_type::Int64Type>>::read_records`
    --> parquet/src/arrow/record_reader/mod.rs:122:29
     |
122  |             records_read += self.read_one_batch(records_to_read)?;
     |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `arrow::array_reader::read_records::<std::vec::Vec<i64>, column::reader::decoder::ColumnValueDecoderImpl<data_type::Int64Type>>`
    --> parquet/src/arrow/array_reader/mod.rs:138:33
     |
138  |         let records_read_once = record_reader.read_records(records_to_read)?;
     |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<arrow::array_reader::primitive_array::PrimitiveArrayReader<data_type::Int64Type> as arrow::array_reader::ArrayReader>::read_records`
    --> parquet/src/arrow/array_reader/primitive_array.rs:134:9
     |
134  |         read_records(&mut self.record_reader, self.pages.as_mut(), batch_size)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<arrow::array_reader::struct_array::StructArrayReader as arrow::array_reader::ArrayReader>::read_records`
    --> parquet/src/arrow/array_reader/struct_array.rs:68:30
     |
68   |             let child_read = child.read_records(batch_size)?;
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<arrow::array_reader::struct_array::StructArrayReader as arrow::array_reader::ArrayReader>::read_records`
    --> parquet/src/arrow/array_reader/struct_array.rs:68:30
     |
68   |             let child_read = child.read_records(batch_size)?;
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<arrow::arrow_reader::ParquetRecordBatchReader as std::iter::Iterator>::next`
    --> parquet/src/arrow/arrow_reader/mod.rs:553:37
     |
553  |                 if let Err(error) = self.array_reader.read_records(self.batch_size) {
     |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `arrow::arrow_reader::tests::test_read_structs`
    --> parquet/src/arrow/arrow_reader/mod.rs:1991:22
     |
1991 |         for batch in record_batch_reader {
     |                      ^^^^^^^^^^^^^^^^^^^
note: inside closure
    --> parquet/src/arrow/arrow_reader/mod.rs:1982:27
     |
1981 |     #[test]
     |     ------- in this procedural macro expansion
1982 |     fn test_read_structs() {
     |                           ^
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

@alamb
Copy link
Contributor Author

alamb commented Jan 6, 2024

THanks for trying @Jefffrey -- I agree I don't know what that error means,

@jhorstmann
Copy link
Contributor

I started looking into this again and think we can replace tempfile with in-memory parquet data in most tests. Will open a PR soon. There is at least one test that takes forever to run with miri and I might exclude that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants