Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when read an empty parquet file #5304

Closed
Liyixin95 opened this issue Jan 16, 2024 · 2 comments · Fixed by #5322
Closed

Panic when read an empty parquet file #5304

Liyixin95 opened this issue Jan 16, 2024 · 2 comments · Fixed by #5322
Labels
bug good first issue Good for newcomers help wanted parquet Changes to the parquet crate

Comments

@Liyixin95
Copy link
Contributor

Describe the bug

To Reproduce

first, create an empty parquet file using pandas

  import pandas as pd

  df = pd.DataFrame().reset_index(drop=True)
  df.to_parquet("./test.parquet")

then, read this parquet using rust:

  let file = File::open("./test.parquet")?;
  let reader = SerializedFileReader::new(file)?;

  let iter = reader.get_row_iter(None)?;
  for record in iter {
      println!("{:?}", record);
  }

and the program paniced:

thread 'main' panicked at D:\env\rust\.cargo\registry\src\rsproxy.cn-8f6827c7555bfaf8\parquet-50.0.0\src\schema\types.rs:1088:78:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:645
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:72
   2: core::panicking::panic
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:127
   3: enum2$<core::option::Option<parquet::format::Type> >::unwrap<parquet::format::Type>
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\option.rs:931
   4: parquet::schema::types::from_thrift_helper
             at D:\env\rust\.cargo\registry\src\rsproxy.cn-8f6827c7555bfaf8\parquet-50.0.0\src\schema\types.rs:1088
   5: parquet::schema::types::from_thrift
             at D:\env\rust\.cargo\registry\src\rsproxy.cn-8f6827c7555bfaf8\parquet-50.0.0\src\schema\types.rs:1035
   6: parquet::file::footer::decode_metadata
             at D:\env\rust\.cargo\registry\src\rsproxy.cn-8f6827c7555bfaf8\parquet-50.0.0\src\file\footer.rs:74
   7: parquet::file::footer::parse_metadata<std::fs::File>
             at D:\env\rust\.cargo\registry\src\rsproxy.cn-8f6827c7555bfaf8\parquet-50.0.0\src\file\footer.rs:65
   8: parquet::file::serialized_reader::SerializedFileReader<std::fs::File>::new<std::fs::File>
             at D:\env\rust\.cargo\registry\src\rsproxy.cn-8f6827c7555bfaf8\parquet-50.0.0\src\file\serialized_reader.rs:182
   9: parquet_test::read_parquet_file<ref$<str$> >
             at .\src\main.rs:9
  10: parquet_test::main
             at .\src\main.rs:29
  11: core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\ops\function.rs:250
  12: core::hint::black_box
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\hint.rs:286
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: process didn't exit successfully: `target\debug\parquet_test.exe` (exit code: 101)

Expected behavior

Additional context

@tustvold
Copy link
Contributor

The issue appears to be that the logic currently assumes that a type with no children is a primitive type, when it should probably also check that it contains a set type_ field.

@tustvold tustvold added the parquet Changes to the parquet crate label Mar 1, 2024
@tustvold
Copy link
Contributor

tustvold commented Mar 1, 2024

label_issue.py automatically added labels {'parquet'} from #5322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue Good for newcomers help wanted parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants