Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary columns (with a Boolean values) can't be used directly as filters #12380

Closed
samuelcolvin opened this issue Sep 7, 2024 · 2 comments · Fixed by #12382
Closed

Dictionary columns (with a Boolean values) can't be used directly as filters #12380

samuelcolvin opened this issue Sep 7, 2024 · 2 comments · Fixed by #12382
Labels
bug Something isn't working

Comments

@samuelcolvin
Copy link
Contributor

Describe the bug

Running

select count(*) v from dicts where json_contains(json_data, str_key2)

Where json_contains returns a Dictionary(UInt32, Boolean) (as of datafusion-contrib/datafusion-functions-json#39), results in:

called `Result::unwrap()` on an `Err` value: Plan("Cannot create filter with non-boolean predicate 'json_contains(dicts.json_data, dicts.str_key2)' returning Dictionary(UInt32, Boolean)")

If I change the sql to

select count(*) v from dicts where json_contains(json_data, str_key2) is true

Works fine.

Is this intended or a mistake?

To Reproduce

No response

Expected behavior

I would have assumed it should just work, but perhaps I'm mistaken?

Additional context

No response

@samuelcolvin
Copy link
Contributor Author

Same seems to be the case for is not null, e.g. it never evaluates to true, is that intentional?

@adriangb
Copy link
Contributor

adriangb commented Sep 8, 2024

I can reproduce with:

    let values = vec![Some(true), Some(false), None, Some(true)];
    let keys = vec![0, 0, 1, 2, 1, 3, 1];
    let values_array = BooleanArray::from(values);
    let keys_array = Int8Array::from(keys);
    let array = DictionaryArray::new(
        keys_array,
        Arc::new(values_array) as Arc<dyn Array>,
    );

    let field = Field::new(
        "my_dict",
        DataType::Dictionary(Box::new(DataType::Int8), Box::new(DataType::Boolean)),
        true,
    );
    let schema = Arc::new(Schema::new(vec![field]));

    let batch = RecordBatch::try_new(schema, vec![Arc::new(array)]).unwrap();

    let ctx = SessionContext::new();

    ctx.register_batch("dict_batch", batch).unwrap();

    let df = ctx.table("dict_batch").await.unwrap();

    // view_all
    let expected = [
        "+---------+",
        "| my_dict |",
        "+---------+",
        "| true    |",
        "| true    |",
        "| false   |",
        "|         |",
        "| false   |",
        "| true    |",
        "| false   |",
        "+---------+",
    ];
    assert_batches_eq!(expected, &df.clone().collect().await.unwrap());

    // filter where is null
    let result_df = df.clone().filter(col("my_dict")).unwrap();
    let expected = [
        "+---------+",
        "| my_dict |",
        "+---------+",
        "| true    |",
        "| true    |",
        "| false   |",
        "| false   |",
        "| true    |",
        "| false   |",
        "+---------+",
    ];
    assert_batches_eq!(expected, &result_df.collect().await.unwrap()); 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants