Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading json map with non-nullable value schema doesn't error if values are actually null #6391

Open
nicklan opened this issue Sep 12, 2024 · 1 comment · Fixed by delta-incubator/delta-kernel-rs#342
Labels

Comments

@nicklan
Copy link

nicklan commented Sep 12, 2024

Describe the bug

If you use an arrow_json::ReaderBuilder to read a json file, and specify a schema that includes a map that shouldn't allow nullable values, you can still read files that have nulls in the actual json map.

To Reproduce

use std::{fs::File, io::BufReader, sync::Arc};

use arrow::datatypes::{DataType, Field, Schema};

fn main() {
    let schema = Arc::new(Schema::new(vec![
        Field::new("str", DataType::Utf8, false),
        Field::new_map(
            "map",
            "entries",
            Field::new("key", DataType::Utf8, false),
            Field::new("value", DataType::Utf8, false), // value is not nullable
            false,
            false
        )
    ]));

    let file = File::open("test.json").unwrap();

    let mut json = arrow_json::ReaderBuilder::new(schema).build(BufReader::new(file)).unwrap();
    let batch = json.next().unwrap().unwrap();
    println!("Batch: {batch:#?}");
}

And use this json file:

{
  "str": "s",
  "map":  {
    "key": null
  }
}

Running produces:

Batch: RecordBatch {
    schema: Schema {
        fields: [
            Field {
                name: "str",
                data_type: Utf8,
                nullable: false,
                dict_id: 0,
                dict_is_ordered: false,
                metadata: {},
            },
            Field {
                name: "map",
                data_type: Map(
                    Field {
                        name: "entries",
                        data_type: Struct(
                            [
                                Field {
                                    name: "key",
                                    data_type: Utf8,
                                    nullable: false,
                                    dict_id: 0,
                                    dict_is_ordered: false,
                                    metadata: {},
                                },
                                Field {
                                    name: "value",
                                    data_type: Utf8,
                                    nullable: false,
                                    dict_id: 0,
                                    dict_is_ordered: false,
                                    metadata: {},
                                },
                            ],
                        ),
                        nullable: false,
                        dict_id: 0,
                        dict_is_ordered: false,
                        metadata: {},
                    },
                    false,
                ),
                nullable: false,
                dict_id: 0,
                dict_is_ordered: false,
                metadata: {},
            },
        ],
        metadata: {},
    },
    columns: [
        StringArray
        [
          "s",
        ],
        MapArray
        [
          StructArray
        [
        -- child 0: "key" (Utf8)
        StringArray
        [
          "key",
        ]
        -- child 1: "value" (Utf8)
        StringArray
        [
          null,
        ]
        ],
        ],
    ],
    row_count: 1,
}

Note I've included the str field so you can easily see that the right thing happens if you change your .json file to

{
  "str": null,
  "map":  {
    "key": null
  }
}

You will get:

called `Result::unwrap()` on an `Err` value: JsonError("Encountered unmasked nulls in non-nullable StructArray child: Field { name: \"str\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }")

Expected behavior

Expect an error similar to what happens when str field is set to null.

Additional context

@nicklan
Copy link
Author

nicklan commented Oct 16, 2024

Ohh jeez, github automatically closed this due to a PR I made that just mentions it. This is not fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant