Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_sortmerge_join_ridealong fails on DB 13.3 #11558

Open
amahussein opened this issue Oct 2, 2024 · 3 comments
Open

[BUG] test_sortmerge_join_ridealong fails on DB 13.3 #11558

amahussein opened this issue Oct 2, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@amahussein
Copy link
Collaborator

Describe the bug

rapids-it-azure-databricks-13.3 job 102 failed with the below error:

"SPARK_VER": "3.3.2db",
"PLUGIN_VER": "24.10.0-SNAPSHOT"

[2024-10-02T13:11:13.010Z] [31mFAILED[0m ../../src/main/python/join_test.py::[1mtest_sortmerge_join_ridealong[RightOuter-Array(Long)][DATAGEN_SEED=1727872690, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})][0m - AssertionError: GPU and CPU int values are different at [179, 'r_b', 0]
[2024-10-02T13:11:13.010Z] [31m= [31m[1m1 failed[0m, [32m2138 passed[0m, [33m26 skipped[0m, [33m31469 deselected[0m, [33m12 xpassed[0m, [33m1286 warnings[0m[31m in 1967.60s (0:32:47)[0m[31m =[0m
[2024-10-02T13:11:13.010Z] --- Logging error ---
[2024-10-02T13:11:13.010Z] Traceback (most recent call last):
[2024-10-02T13:11:13.010Z]   File "/usr/lib/python3.10/logging/__init__.py", line 1103, in emit
[2024-10-02T13:11:13.010Z]     stream.write(msg + self.terminator)
[2024-10-02T13:11:13.010Z] ValueError: I/O operation on closed file.
[2024-10-02T13:11:13.010Z] Call stack:
[2024-10-02T13:11:13.010Z]   File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/clientserver.py", line 671, in __del__
[2024-10-02T13:11:13.010Z]     self.close()
[2024-10-02T13:11:13.010Z]   File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/clientserver.py", line 568, in close
[2024-10-02T13:11:13.010Z]     logger.info("Closing down clientserver connection")
[2024-10-02T13:11:13.010Z] Message: 'Closing down clientserver connection'
[2024-10-02T13:11:13.010Z] Arguments: ()
[2024-10-02T13:11:17.196Z] + ret=1
[2024-10-02T13:11:17.196Z] + set -e
[2024-10-02T13:11:17.196Z] + '[' 1 = 5 ']'
[2024-10-02T13:11:17.196Z] + exit 1
[2024-10-02T13:11:17.196Z] Exception: run command failed: CompletedProcess(args="ssh -o StrictHostKeyChecking=no -o TCPKeepAlive=yes -o ServerAliveInterval=10 -p ***** -i **** user@******** -- 'TEST=join_test.py' 'TEST_TAGS=' 'DB_PORT=****' 'LOCAL_JAR_PATH=/home/ubuntu' './run_it.sh'", returncode=1)
@amahussein amahussein added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 2, 2024
@mattahrens
Copy link
Collaborator

Subsequent jobs have succeeded (with different seeds), need to reproduce with exact datagen seed in this failure.

@mattahrens
Copy link
Collaborator

Note that the job had multiple test failures beyond the one reported in this issue.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 8, 2024
@razajafri
Copy link
Collaborator

I ran every failing test on Azure Databricks and ran a few of them on AWS Databricks with the given DATAGEN_SEED set but was unable to reproduce the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants