[BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979 #9992

revans2 · 2023-12-07T21:43:19Z

Describe the bug

TEST_PARALLEL=0 DATAGEN_SEED=1701976979 TZ=UTC ./run_pyspark_from_build.sh --test_oom_injection_mode=always -k 'test_conditional_with_side_effects_cast'

Causes this to fail all the time. It looks like it is a test error because this is failing on the CPU with no GPU code running. I think it is related to the RLIKE that runs, but I am not 100% sure. Probably the \z is not escaping things properly.

The text was updated successfully, but these errors were encountered:

revans2 · 2023-12-07T21:53:52Z

Yup if I update the test to escape the z more we get an error because we cannot support it on the GPU. But using a $ works. But I don't think that is the right solution because we might have 100\nBAD STUFF which would cause the same kind of problem. I think the ^ is also not correct.

We probably want to update test_conditional_with_side_effects_case_when too that is doing something similar. But we also want to update the generated data to make it actually hit the various parts and verify that the input/output data looks correct.

gerashegalov · 2023-12-21T08:12:49Z

This is an instance where the test is not accounting for the need to \\\\ which can be avoided by using raw strings either at Python or SQL levels #8289

Currently the test ends up using the regex ^[0-9]{1,5}z instead of ^[0-9]{1,5}\z.

And the seed produces a string starting with 5z making it a match for RLIKE. Thus both CPU and GPU fail

spark.sql("SELECT a, a RLIKE '^[0-9]{1,5}z' FROM df WHERE a RLIKE '^[0-9]{1,5}z'").show()
+----------+--------------------+
|         a|a RLIKE ^[0-9]{1,5}z|
+----------+--------------------+
|5z.Q��ô�Ù�|                true|
+----------+--------------------+

abellina · 2023-12-28T18:34:07Z

@gerashegalov for this test, the @datagen_override is still there, yet the issue is closed (https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/integration_tests/src/main/python/conditionals_test.py#L211).

Is that intended? I am going through various places where we'd like to mark the override permanent, and this one stands out.

gerashegalov · 2023-12-28T18:45:28Z

Good catch @abellina. #10090 should have deleted the override

Fixes NVIDIA#9992 Addendum to NVIDIA#10090 Signed-off-by: Gera Shegalov <[email protected]>

abellina · 2023-12-28T18:49:54Z

Good catch @abellina. #10090 should have deleted the override

thanks for confirming! I can remove in the test pr I have open.

gerashegalov · 2023-12-28T18:54:55Z

Good catch @abellina. #10090 should have deleted the override

thanks for confirming! I can remove in the test pr I have open.

I opened #10112 . Let us keep it out of the bigger scope PR you are working on in case it gets reverted :)

…ast (#10112) Fixes #9992 Addendum to #10090 Signed-off-by: Gera Shegalov <[email protected]>

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests labels Dec 7, 2023

res-life mentioned this issue Dec 11, 2023

Using fix seed to unblock 23.12 release; Move the blocked issues to 24.02 #10009

Merged

mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 12, 2023

mattahrens assigned razajafri and gerashegalov and unassigned razajafri Dec 14, 2023

gerashegalov mentioned this issue Dec 21, 2023

[BUG] DATAGEN_SEED=<seed> environment does not override the marker datagen_overrides #10089

Closed

gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Dec 21, 2023

fixes NVIDIA#9992

b0d14ff

gerashegalov mentioned this issue Dec 21, 2023

Replace GPU-unsupported \z with an alternative RLIKE expression #10090

Merged

gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Dec 21, 2023

fixes NVIDIA#9992

12aefc0

revans2 closed this as completed in #10090 Dec 26, 2023

gerashegalov mentioned this issue Dec 28, 2023

Fixes a bug where datagen seed overrides were sticky and adds datagen_seed_override_disabled #10109

Merged

gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Dec 28, 2023

Remove datagen seed override for test_conditional_with_side_effects_cast

ee0b329

Fixes NVIDIA#9992 Addendum to NVIDIA#10090 Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov mentioned this issue Dec 28, 2023

Remove datagen seed override for test_conditional_with_side_effects_cast #10112

Merged

gerashegalov added a commit that referenced this issue Dec 28, 2023

Remove datagen seed override for test_conditional_with_side_effects_c…

7ed4a69

…ast (#10112) Fixes #9992 Addendum to #10090 Signed-off-by: Gera Shegalov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979 #9992

[BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979 #9992

revans2 commented Dec 7, 2023

revans2 commented Dec 7, 2023

gerashegalov commented Dec 21, 2023

abellina commented Dec 28, 2023

gerashegalov commented Dec 28, 2023

abellina commented Dec 28, 2023 •

edited

Loading

gerashegalov commented Dec 28, 2023

[BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979 #9992

[BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979 #9992

Comments

revans2 commented Dec 7, 2023

revans2 commented Dec 7, 2023

gerashegalov commented Dec 21, 2023

abellina commented Dec 28, 2023

gerashegalov commented Dec 28, 2023

abellina commented Dec 28, 2023 • edited Loading

gerashegalov commented Dec 28, 2023

abellina commented Dec 28, 2023 •

edited

Loading