Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_in_set fails when DATAGEN_SEED=1698940723 #9687

Closed
abellina opened this issue Nov 13, 2023 · 6 comments
Closed

[BUG] test_in_set fails when DATAGEN_SEED=1698940723 #9687

abellina opened this issue Nov 13, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@abellina
Copy link
Collaborator

abellina commented Nov 13, 2023

I cannot repro this locally with my RTX6000, or on T4. This fails on A30 so far.

What the CI executed (should repro locally, but I haven't been able to do that):

SPARK_RAPIDS_TEST_DATAGEN_SEED=1698940723 ./run_pyspark_from_build.sh -k test_in_set\ and\ Double
FAILED ../../src/main/python/cmp_test.py::test_in_set[Double][DATAGEN_SEED=1698940723]
[2023-11-02T17:07:55.802Z] =================================== FAILURES ===================================
[2023-11-02T17:07:55.802Z] ^[[31m^[[1m_____________________________ test_in_set[Double] ______________________________^[[0m
[2023-11-02T17:07:55.802Z] [gw4] linux -- Python 3.8.10 /usr/bin/python
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z] data_gen = Double
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z]     @pytest.mark.parametrize('data_gen', eq_gens_with_decimal_gen, ids=idfn)
[2023-11-02T17:07:55.802Z]     def test_in_set(data_gen):
[2023-11-02T17:07:55.802Z]         # nulls are not supported for in on the GPU yet
[2023-11-02T17:07:55.802Z]         num_entries = int(with_cpu_session(lambda spark: spark.conf.get('spark.sql.optimizer.inSetConversionThreshold'))) + 1
[2023-11-02T17:07:55.802Z]         # we have to make the scalars in a session so negative scales in decimals are supported
[2023-11-02T17:07:55.802Z]         scalars = with_cpu_session(lambda spark: list(gen_scalars(data_gen, num_entries, force_no_nulls=not isinstance(data_gen, NullGen))))
[2023-11-02T17:07:55.802Z] >       assert_gpu_and_cpu_are_equal_collect(
[2023-11-02T17:07:55.802Z]                 lambda spark : unary_op_df(spark, data_gen).select(f.col('a').isin(scalars)))
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/cmp_test.py^[[0m:338:
[2023-11-02T17:07:55.802Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:581: in assert_gpu_and_cpu_are_equal_collect
[2023-11-02T17:07:55.802Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:502: in _assert_gpu_and_cpu_are_equal
[2023-11-02T17:07:55.802Z]     assert_equal(from_cpu, from_gpu)
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:107: in assert_equal
[2023-11-02T17:07:55.802Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:43: in _assert_equal
[2023-11-02T17:07:55.802Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2023-11-02T17:07:55.802Z] ^[[1m^[[31m../../src/main/python/asserts.py^[[0m:36: in _assert_equal
[2023-11-02T17:07:55.802Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2023-11-02T17:07:55.802Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-11-02T17:07:55.802Z]
[2023-11-02T17:07:55.802Z] cpu = False, gpu = True
[2023-11-02T17:07:55.802Z] float_check = <function get_float_check.<locals>.<lambda> at 0x7f283618e940>
[2023-11-02T17:07:55.802Z] path = [2, '(a IN (CAST(-1.8267391899860874E80 AS DOUBLE), CAST(-4.1675090531674384E-103 AS DOUBLE), CAST(NaN AS DOUBLE), CAS...OUBLE), CAST(3.4452000335727635E-205 AS DOUBLE), CAST(-Infinity AS DOUBLE), CAST(-2.0147996288524902E268 AS DOUBLE)))']

@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 13, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023
@ttnghia
Copy link
Collaborator

ttnghia commented Nov 16, 2023

Probably you ran the test locally after adding @datagen_overrides. I still can reproduce it on my local Quadro RTX 6000 (after removing the line @datagen_overrides).

@ttnghia
Copy link
Collaborator

ttnghia commented Nov 16, 2023

This looks like a bug in our isin operator that incorrectly handles NaN.

@ttnghia ttnghia self-assigned this Nov 16, 2023
@ttnghia
Copy link
Collaborator

ttnghia commented Nov 16, 2023

Indeed, our code is correct. This is a bug in Spark: https://issues.apache.org/jira/browse/SPARK-36792.

@ttnghia
Copy link
Collaborator

ttnghia commented Nov 16, 2023

When checking a isin (list...) with list size more than 10 elements (so the IsIn operator is called), Spark CPU cannot find NaN in the given list so it produces incorrect output. If the given list has 10 elements or less then the operator In is called and can correctly produce the answer.

scala> df1.show(false)
+------------------------+
|a                       |
+------------------------+
|-1.8267391899860874E80  |
|-4.1675090531674384E-103|
|NaN                     |
+------------------------+


scala> spark.sql("select * from df1 where a in (CAST(-1.8267391899860874E80 AS DOUBLE), CAST(-4.1675090531674384E-103 AS DOUBLE), CAST('NaN' AS DOUBLE), CAST(-8.559820088589135E179 AS DOUBLE), CAST(1.1176123717679094E-166 AS DOUBLE), CAST(3.2567422266986805E-294 AS DOUBLE), CAST(-4.142554702629836E-92 AS DOUBLE), CAST(2.813265202703976E56 AS DOUBLE), CAST(3.4452000335727635E-205 AS DOUBLE), CAST('-Infinity' AS DOUBLE), CAST(-2.0147996288524902E268 AS DOUBLE))").show(false)
+------------------------+
|a                       |
+------------------------+
|-1.8267391899860874E80  |
|-4.1675090531674384E-103|
+------------------------+


scala> spark.sql("select * from df1 where a in (CAST(-1.8267391899860874E80 AS DOUBLE), CAST(-4.1675090531674384E-103 AS DOUBLE), CAST('NaN' AS DOUBLE), CAST(-8.559820088589135E179 AS DOUBLE), CAST(1.1176123717679094E-166 AS DOUBLE), CAST(3.2567422266986805E-294 AS DOUBLE), CAST(-4.142554702629836E-92 AS DOUBLE), CAST(2.813265202703976E56 AS DOUBLE), CAST(3.4452000335727635E-205 AS DOUBLE), CAST('-Infinity' AS DOUBLE))").show(false)
+------------------------+
|a                       |
+------------------------+
|-1.8267391899860874E80  |
|-4.1675090531674384E-103|
|NaN                     |
+------------------------+

@ttnghia
Copy link
Collaborator

ttnghia commented Nov 16, 2023

Note that df1 above is read from parquet generated in the failed test. When I tried to create it manually then InSet produces a correct output, so I guess the NaN value read from parquet here is probably a negative NaN.

@ttnghia ttnghia changed the title [BUG] test_in_set fails on A30 when DATAGEN_SEED=1698940723 [BUG] test_in_set fails when DATAGEN_SEED=1698940723 Nov 17, 2023
@ttnghia ttnghia changed the title [BUG] test_in_set fails when DATAGEN_SEED=1698940723 [BUG] test_in_set fails when DATAGEN_SEED=1698940723 Nov 17, 2023
@pxLi
Copy link
Collaborator

pxLi commented Dec 11, 2023

should be closed by #9928

@pxLi pxLi closed this as completed Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants