Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{devel}[foss/2022a] PyTorch v1.12.1 w/ Python 3.10.4 (+ CUDA 11.7.0) #16484

Merged

Conversation

Flamefire
Copy link
Contributor

(created using eb --new-pr)

@smoors
Copy link
Contributor

smoors commented Oct 25, 2022

@boegelbot: please test @ generoso

@boegelbot
Copy link
Collaborator

@smoors: Request for testing this PR well received on login1

PR test command 'EB_PR=16484 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_16484 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9376

Test results coming soon (I hope)...

- notification for comment with ID 1291050768 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
cns5 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/bd5c3f152d9e2eb6af79d28a50d04532 for a full test report.

@Flamefire
Copy link
Contributor Author

Flamefire commented Oct 27, 2022

The test failures are actually expected: PyTorch 1.12 is NOT fully compatible with Python 3.10 yet! This is one of the reasons I was against the new feature of allowing any test to fail only limited by the number of tests.

Edit: Will look at the remaining failures later.

@Flamefire Flamefire force-pushed the 20221025125938_new_pr_PyTorch1121 branch 2 times, most recently from 16aedf5 to dc8d960 Compare November 22, 2022 12:20
@smoors
Copy link
Contributor

smoors commented Nov 22, 2022

@boegelbot: please test @ generoso

@boegelbot
Copy link
Collaborator

@smoors: Request for testing this PR well received on login1

PR test command 'EB_PR=16484 EB_ARGS= EB_CONTAINER= /opt/software/slurm/bin/sbatch --job-name test_PR_16484 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9622

Test results coming soon (I hope)...

- notification for comment with ID 1323666191 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire Flamefire force-pushed the 20221025125938_new_pr_PyTorch1121 branch from 47ed021 to 61bf727 Compare November 22, 2022 14:59
@Flamefire Flamefire marked this pull request as draft November 22, 2022 15:10
@Flamefire
Copy link
Contributor Author

Still working on a test failure which looks real. --> Draft

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
cns3 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/e5b4ba49bff2463298a461b3b4d2dfb5 for a full test report.

@Flamefire
Copy link
Contributor Author

Looks like this is blocked due to a incompatibility with CUDA 11.7 (possibly also 11.6 already): pytorch/pytorch#89684
Not sure how to handle this.

@Flamefire Flamefire force-pushed the 20221025125938_new_pr_PyTorch1121 branch 2 times, most recently from 0e8fc9c to 10ac6cd Compare November 28, 2022 13:09
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
taurusml24 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/c70d58d433b5dd2256a83283e55c5606 for a full test report.

@Flamefire Flamefire marked this pull request as ready for review November 29, 2022 08:13
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
taurusa12 - Linux CentOS Linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), 3 x NVIDIA GeForce GTX 1080 Ti, 460.32.03, Python 2.7.5
See https://gist.github.com/ed49d46b8068b07874ce815011648c6c for a full test report.

…-1.12.1-foss-2022a.eb and patches: PyTorch-1.12.1_fix-test_wishart_log_prob.patch, PyTorch-1.12.1_python-3.10-annotation-fix.patch, PyTorch-1.12.1_python-3.10-compat.patch, PyTorch-1.12.1_remove-flaky-test-in-testnn.patch
@Flamefire Flamefire force-pushed the 20221025125938_new_pr_PyTorch1121 branch from 10ac6cd to 544b780 Compare November 30, 2022 08:30
@Flamefire
Copy link
Contributor Author

@branfosj Rebased on #16453 after that got merged to clean up the changes, no changes to the files

@branfosj branfosj added this to the next release (4.7.0) milestone Nov 30, 2022
@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
bear-pg0103u14a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 2 x NVIDIA NVIDIA A30, 470.57.02, Python 3.6.8
See https://gist.github.com/139b279cb831560ebb5cdd1cfae003f3 for a full test report.

@branfosj
Copy link
Member

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 2f1db6e into easybuilders:develop Nov 30, 2022
@Flamefire Flamefire deleted the 20221025125938_new_pr_PyTorch1121 branch December 1, 2022 12:49
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
taurusi8018 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/028d2a34f8d125790582a05c125fd718 for a full test report.

@boegel boegel changed the title {devel}[foss/2022a] PyTorch v1.12.1 w/ Python 3.10.4 {devel}[foss/2022a] PyTorch v1.12.1 w/ Python 3.10.4 (+ CUDA 11.7.0) Dec 7, 2022
@boegel
Copy link
Member

boegel commented Dec 7, 2022

Test report by @boegel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
node3103.skitty.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/19396bc77c5a3da529ddc21b2883f51c for a full test report.

@boegel
Copy link
Member

boegel commented Dec 7, 2022

Test report by @boegel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
node3303.joltik.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 520.61.05, Python 3.6.8
See https://gist.github.com/0ddbcef4450428d0c674e2f7b21dffe0 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants