Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fieldless unions broken on main #57

Closed
chmp opened this issue May 27, 2023 · 5 comments
Closed

Fieldless unions broken on main #57

chmp opened this issue May 27, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@chmp
Copy link
Owner

chmp commented May 27, 2023

Original report:

Not sure whether this should be its own issue or under this one, but I am currently unable to serialize field-less enums - they all panic with a backtrace like this:

thread '<unnamed>' panicked at 'index out of bounds: the len is 2 but the index is 3', /Users/davidroher/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_arrow-0.7.0/src/internal/generic_sinks/union.rs:85:45
stack backtrace:
   0: rust_begin_unwind
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
   1: core::panicking::panic_fmt
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
   2: core::panicking::panic_bounds_check
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:159:5
   3: <serde_arrow::internal::generic_sinks::union::UnionArrayBuilder<B> as serde_arrow::internal::sink::EventSink>::accept_variant
   4: <serde_arrow::internal::generic_sinks::struct::StructArrayBuilder<B> as serde_arrow::internal::sink::EventSink>::accept_variant
   5: <serde_arrow::internal::generic_sinks::struct::StructArrayBuilder<B> as serde_arrow::internal::sink::EventSink>::accept_variant
   6: <serde_arrow::internal::sink::EventSerializer<S> as serde::ser::Serializer>::serialize_unit_variant
   7: boxball_rs::event_file::game_state::_::<impl serde::ser::Serialize for boxball_rs::event_file::game_state::GameMetadata>::serialize

As soon as it gets to any struct which has a fieldless enum as one of its fields (when I remove all enum fields from the struct, i no longer get the error).

@chmp chmp added the bug Something isn't working label May 27, 2023
@chmp
Copy link
Owner Author

chmp commented May 27, 2023

@droher Thanks for posting the issue. Do you have a minmal example by any chance you can post? I added some tests on main, and for me everything seems to work fine (even if it requires a bit of tweaking of the default settings). See here

@chmp chmp closed this as completed May 27, 2023
@chmp chmp reopened this May 27, 2023
@droher
Copy link

droher commented May 27, 2023

Here's one that fails with the given enums:

use serde::Serialize;
use serde_arrow::{
    schema::TracingOptions,
    arrow2::{serialize_into_fields, serialize_into_arrays}
};

#[derive(Debug, Eq, PartialEq, Clone, Copy, Serialize)]
pub enum AccountType {
    PlayByPlay,
    Deduced,
    BoxScore,
}

#[derive(Ord, PartialOrd, Debug, Eq, PartialEq, Copy, Clone, Hash, Serialize, Deserialize)]
pub enum GameType {
    SpringTraining,
    RegularSeason,
    AllStarGame,
    WildCardSeries,
    DivisionSeries,
    LeagueChampionshipSeries,
    WorldSeries,
    NegroLeagues,
    Other,
}

#[derive(Debug, Eq, PartialEq, Clone, Copy, Serialize)]
pub struct FileInfo {
    pub filename: String,
    pub game_type: GameType,
    pub account_type: AccountType,
    pub file_index: usize,
}

fn example() {
        let infos = vec![
            FileInfo { 
                filename: String::from("test"),
                game_type: GameType::RegularSeason,
                account_type: AccountType::Deduced,
                file_index: 0 
            }
        ];
        let opts = TracingOptions::default().allow_null_fields(true);
        let fields = serialize_into_fields(&infos, opts).unwrap();
        let arrays = serialize_into_arrays(&fields, &infos).unwrap();
}

@chmp
Copy link
Owner Author

chmp commented May 27, 2023

Thanks for the example! This was super helpful. The root cause for the error you see is, that you do not specify all possible values of the unions when you trace the fields.

In principle this usage should be supported by raising errors for variants that are encountered during serialization, but were not seen during tracing. However, there are a couple of issues with the current serde_arrow implementation:

  • the logic that tries to work around missing union variants was not very well designed (it would just assume that the union variants do not have fields)
  • also there is a bug in that case that shuffles the variant names (and messes up the mapping variant index -> variant name)
  • in any case it should not panic

I will fix these issues, but have to think about, how to balance complexity and user friendliness :)

In the mean time you can simply make sure that there are no "holes" in the unions you use during tracing by including all variants up the maximum variant you would like to use (the order in which the variants are encountered by serde-arrow does not matter):

let infos = vec![
    FileInfo { 
        filename: String::from("test"),
        game_type: GameType::RegularSeason,
        account_type: AccountType::Deduced,
        file_index: 0 
    },
    FileInfo { 
        filename: String::from("test"),
        game_type: GameType::SpringTraining,
        account_type: AccountType::PlayByPlay,
        file_index: 0 
    },
];
let opts = TracingOptions::default().allow_null_fields(true);
let fields = serialize_into_fields(&infos, opts).unwrap();

@droher
Copy link

droher commented May 27, 2023

Thanks - using a dummy struct like the one above, I was able to get it working for subsets of my serialization, but still got index errors on the real-world data.

I think I am going to wait until this gets resolved to put more work into it on my end, but I am really looking forward to using it in my project; it'll be such a big help. Thanks for your help here and your work on the library in general!

@chmp
Copy link
Owner Author

chmp commented May 29, 2023

Thanks for the feedback and for reporting this issue!

I pushed a new release 0.7.1 that fixes the issue. Note: you still need to make sure that all enum variants that will be seen by serialize_into_arrays are also seen by serialize_into_fields, but that should be the only restriction.

@chmp chmp closed this as completed May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants