Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_cast_string_ts_valid_format fails with DATAGEN_SEED=1699978422 #9708

Closed
abellina opened this issue Nov 14, 2023 · 2 comments · Fixed by #9889
Closed

[BUG] test_cast_string_ts_valid_format fails with DATAGEN_SEED=1699978422 #9708

abellina opened this issue Nov 14, 2023 · 2 comments · Fixed by #9889
Assignees
Labels
bug Something isn't working

Comments

@abellina
Copy link
Collaborator

Repro:

SPARK_RAPIDS_TEST_DATAGEN_SEED=1699978422 ./run_pyspark_from_build.sh -k test_cast_string_ts_valid_format
FAILED ../../../../integration_tests/src/main/python/cast_test.py::test_cast_string_ts_valid_format[String2][DATAGEN_SEED=1699978422, INJECT_OOM] - AssertionError: GPU and CPU timestamp values are different at [1614, 'a']
@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 14, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023
@jlowe
Copy link
Member

jlowe commented Nov 21, 2023

Looked into this a bit, and I think there's two problems here. First, the test generates strings that almost always are invalid timestamps, so the test is not very useful in practice. However with this specific datagen seed, it happens to generate a valid timestamp in the third row, specifically 7141-09-13 08:15:02+121024 which parses to 7141-09-12 15:04:38 on the CPU but parses to null on the GPU.

@thirtiseven
Copy link
Collaborator

thirtiseven commented Nov 29, 2023

In case of failure: StringGen('[0-9]{1,4}-[0-3][0-9]-[0-5][0-9][ |T][0-3][0-9]:[0-6][0-9]:[0-6][0-9].[0-9]{0,6}Z?')

The . here is meant to be a literal ., but it's a wildcard character in regex, so it can generate anything in this case, which also leads to many invalid timestamps.

In the failed case it generated '+': 7141-09-13 08:15:02+121024 and Spark somehow supports this format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants