[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723 #9687

abellina · 2023-11-13T22:17:45Z

I cannot repro this locally with my RTX6000, or on T4. This fails on A30 so far.

What the CI executed (should repro locally, but I haven't been able to do that):

SPARK_RAPIDS_TEST_DATAGEN_SEED=1698940723 ./run_pyspark_from_build.sh -k test_in_set\ and\ Double

FAILED ../../src/main/python/cmp_test.py::test_in_set[Double][DATAGEN_SEED=1698940723]

[2023-11-02T17:07:55.802Z] =================================== FAILURES ===================================
[2023-11-02T17:07:55.802Z] ^[[31m^[[1m_____________________________ test_in_set[Double] ______________________________^[[0m
[2023-11-02T17:07:55.802Z] [gw4] linux -- Python 3.8.10 /usr/bin/python
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z] data_gen = Double
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z]     @pytest.mark.parametrize('data_gen', eq_gens_with_decimal_gen, ids=idfn)
[2023-11-02T17:07:55.802Z]     def test_in_set(data_gen):
[2023-11-02T17:07:55.802Z]         # nulls are not supported for in on the GPU yet
[2023-11-02T17:07:55.802Z]         num_entries = int(with_cpu_session(lambda spark: spark.conf.get('spark.sql.optimizer.inSetConversionThreshold'))) + 1
[2023-11-02T17:07:55.802Z]         # we have to make the scalars in a session so negative scales in decimals are supported
[2023-11-02T17:07:55.802Z]         scalars = with_cpu_session(lambda spark: list(gen_scalars(data_gen, num_entries, force_no_nulls=not isinstance(data_gen, NullGen))))
[2023-11-02T17:07:55.802Z] >       assert_gpu_and_cpu_are_equal_collect(
[2023-11-02T17:07:55.802Z]                 lambda spark : unary_op_df(spark, data_gen).select(f.col('a').isin(scalars)))
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/cmp_test.py^[[0m:338:
[2023-11-02T17:07:55.802Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:581: in assert_gpu_and_cpu_are_equal_collect
[2023-11-02T17:07:55.802Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:502: in _assert_gpu_and_cpu_are_equal
[2023-11-02T17:07:55.802Z]     assert_equal(from_cpu, from_gpu)
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:107: in assert_equal
[2023-11-02T17:07:55.802Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:43: in _assert_equal
[2023-11-02T17:07:55.802Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:36: in _assert_equal
[2023-11-02T17:07:55.802Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2023-11-02T17:07:55.802Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z] cpu = False, gpu = True
[2023-11-02T17:07:55.802Z] float_check = <function get_float_check.<locals>.<lambda> at 0x7f283618e940>
[2023-11-02T17:07:55.802Z] path = [2, '(a IN (CAST(-1.8267391899860874E80 AS DOUBLE), CAST(-4.1675090531674384E-103 AS DOUBLE), CAST(NaN AS DOUBLE), CAS...OUBLE), CAST(3.4452000335727635E-205 AS DOUBLE), CAST(-Infinity AS DOUBLE), CAST(-2.0147996288524902E268 AS DOUBLE)))']

The text was updated successfully, but these errors were encountered:

ttnghia · 2023-11-16T05:25:14Z

Probably you ran the test locally after adding @datagen_overrides. I still can reproduce it on my local Quadro RTX 6000 (after removing the line @datagen_overrides).

ttnghia · 2023-11-16T05:35:05Z

This looks like a bug in our isin operator that incorrectly handles NaN.

ttnghia · 2023-11-16T22:09:54Z

Indeed, our code is correct. This is a bug in Spark: https://issues.apache.org/jira/browse/SPARK-36792.

ttnghia · 2023-11-16T22:15:09Z

When checking a isin (list...) with list size more than 10 elements (so the IsIn operator is called), Spark CPU cannot find NaN in the given list so it produces incorrect output. If the given list has 10 elements or less then the operator In is called and can correctly produce the answer.

scala> df1.show(false)
+------------------------+
|a                       |
+------------------------+
|-1.8267391899860874E80  |
|-4.1675090531674384E-103|
|NaN                     |
+------------------------+


scala> spark.sql("select * from df1 where a in (CAST(-1.8267391899860874E80 AS DOUBLE), CAST(-4.1675090531674384E-103 AS DOUBLE), CAST('NaN' AS DOUBLE), CAST(-8.559820088589135E179 AS DOUBLE), CAST(1.1176123717679094E-166 AS DOUBLE), CAST(3.2567422266986805E-294 AS DOUBLE), CAST(-4.142554702629836E-92 AS DOUBLE), CAST(2.813265202703976E56 AS DOUBLE), CAST(3.4452000335727635E-205 AS DOUBLE), CAST('-Infinity' AS DOUBLE), CAST(-2.0147996288524902E268 AS DOUBLE))").show(false)
+------------------------+
|a                       |
+------------------------+
|-1.8267391899860874E80  |
|-4.1675090531674384E-103|
+------------------------+


scala> spark.sql("select * from df1 where a in (CAST(-1.8267391899860874E80 AS DOUBLE), CAST(-4.1675090531674384E-103 AS DOUBLE), CAST('NaN' AS DOUBLE), CAST(-8.559820088589135E179 AS DOUBLE), CAST(1.1176123717679094E-166 AS DOUBLE), CAST(3.2567422266986805E-294 AS DOUBLE), CAST(-4.142554702629836E-92 AS DOUBLE), CAST(2.813265202703976E56 AS DOUBLE), CAST(3.4452000335727635E-205 AS DOUBLE), CAST('-Infinity' AS DOUBLE))").show(false)
+------------------------+
|a                       |
+------------------------+
|-1.8267391899860874E80  |
|-4.1675090531674384E-103|
|NaN                     |
+------------------------+

ttnghia · 2023-11-16T22:34:57Z

Note that df1 above is read from parquet generated in the failed test. When I tried to create it manually then InSet produces a correct output, so I guess the NaN value read from parquet here is probably a negative NaN.

pxLi · 2023-12-11T03:08:14Z

should be closed by #9928

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 13, 2023

This was referenced Nov 13, 2023

Add a random seed specific to datagen cases #9441

Merged

Follow up from random datagen seed PR #9703

Open

mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023

ttnghia self-assigned this Nov 16, 2023

ttnghia changed the title ~~[BUG] test_in_set fails on A30 when DATAGEN_SEED=1698940723~~ [BUG] test_in_set fails when DATAGEN_SEED=1698940723 Nov 17, 2023

ttnghia changed the title ~~[BUG] test_in_set fails when DATAGEN_SEED=1698940723~~ [BUG] test_in_set fails when DATAGEN_SEED=1698940723 Nov 17, 2023

This was referenced Nov 30, 2023

Avoid testing inset with NaN for Spark before 3.2.0 #9911

Closed

Test inset with NaN only for Spark from 3.1.3 #9928

Merged

pxLi closed this as completed Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723 #9687

[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723 #9687

abellina commented Nov 13, 2023 •

edited

Loading

ttnghia commented Nov 16, 2023

ttnghia commented Nov 16, 2023 •

edited

Loading

ttnghia commented Nov 16, 2023

ttnghia commented Nov 16, 2023 •

edited

Loading

ttnghia commented Nov 16, 2023 •

edited

Loading

pxLi commented Dec 11, 2023

[BUG] test_in_set fails when DATAGEN_SEED=1698940723 #9687

[BUG] test_in_set fails when DATAGEN_SEED=1698940723 #9687

Comments

abellina commented Nov 13, 2023 • edited Loading

ttnghia commented Nov 16, 2023

ttnghia commented Nov 16, 2023 • edited Loading

ttnghia commented Nov 16, 2023

ttnghia commented Nov 16, 2023 • edited Loading

ttnghia commented Nov 16, 2023 • edited Loading

pxLi commented Dec 11, 2023

[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723 #9687

[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723 #9687

abellina commented Nov 13, 2023 •

edited

Loading

ttnghia commented Nov 16, 2023 •

edited

Loading

ttnghia commented Nov 16, 2023 •

edited

Loading

ttnghia commented Nov 16, 2023 •

edited

Loading