-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple JTReg JVM Failures on large heap builds #8637
Comments
@r30shah this is an assert out of |
Also fails on s390x_linux_xl for
Dumps: https://ibm.box.com/s/fwhhfnacqphjcv9aq1g4opnwzcxzf37y |
Code where this failure occurs is working on finding register to use for common node across spit point to uncommon it. When it gather the information about available registers to use, it scans all the exit edges in the Extended Basic Block and see if it can use any information. It throws an Assert when it sees a node under @M-Davies Would you able to share the job link of these failures? I quickly tried running |
Note jdk14 builds from Adopt are available, just not from the website. |
|
@M-Davies I have been trying to reproduce this failure locally on both Linux on Z (I am running on Ubuntu 18 , z13) and macOS - 10.15. I do not have any success yet. I do not know if I have access to launch a grinder, Would it be possible for you to launch a grinder (30 runs are sufficient if you are seeing 1/4 failures) and collect a log file of the method we are compiling and failed. |
@r30shah |
@M-Davies Thanks a lot for launching the grinder. Mac one failed, and I downloaded the files, seems like the launcher didn't like the options. I am seeing following in the javacore.
So signature of the method is messed up. Can I request you to launch another one, with following option on macOS again? |
@r30shah https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/2360/ |
@M-Davies Sorry to bother you again, seems like the grinder did not work, seeing following failure,
|
Apologies for the delay. It was using a binary that had already been deleted and I couldn't create new grinder on Friday with Jenkins down. Have a rerun going at |
Grinder in previous comment didn't fail. I have tried reproduced this on internal machines with no success. @AdamBrousseau @pshipton Is it possible to get access to one of the macOS machine where this fails fairly, the machine @M-Davies have seen this failure on. Given that fix is targeted at 0.19 release, this will help speed up the investigation and create fix. |
These are machines at AdoptOpenJDK.net, @smlambert can you help with the request in the previous comment. |
Process to request access to a machine at AdoptOpenJDK used to be to raise a request in the openjdk-infrastructure repo, assume that still holds (@sxa555 can correct me if that has changed). Search for "should have a regStore pre split point" in TRSS in recent jdk14 pipeline, results show that it happens on 3 platforms (3 machines test-godaddy-centos7-x64-1, test-marist-ubuntu1604-s390x-2, test-macincloud-macos1010-x64-2). Also suggest trying to reproduce on the 2 non-mac platforms internally if that has not been tried, in case those platforms match external configuration more closely than mac platform. Looking at Deep History in TRSS on xLinux, shows that reproduces most frequently on test-godaddy-centos7-x64-1, but did also occur on test-scaleway-ubuntu1604-x64-1. |
I searched the last several jdk14 pipelines and see it occur intermittently in ppc64le_linux_xl, x86-64_mac_xl and s390x_linux_xl test runs, and seen once in a (Passing) windows_xl run. Noting I have not found it occurring in compressedrefs runs. Also noting, seen all the way back to the first jdk14 test runs at AdoptOpenJDK launched from jdk14 pipeline 37 (do not have history before that), whose java -version info is: OpenJDK Runtime Environment AdoptOpenJDK (build 14+36-202002192048) |
Thanks @smlambert !
|
@r30shah https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/2402/ (CloseDuringConnect.java) failed 5 times. You should be able to see any javacore files that it created by downloading the openjdk_test_output.tar.gz (IIRC, they're at I've kicked off two more grinders with the test targets swapped over to see if I can produce a failure output with them: |
Nope that's generally the right way to do it (although I don't think I have full access to all of the mac machines so I'd need to engage someone - probably at MSFT - who can grant it) |
Occurred on last night's nightly build on test-osuosl-ubuntu1604-ppc64le-4: https://ci.adoptopenjdk.net/job/Test_openjdk14_j9_sanity.openjdk_ppc64le_linux_xl/16/console |
Thanks @smlambert @M-Davies for all the pointers. I have been able to make progress in investigating the failures by analyzing the log file from the https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/2402/. So failure happens because of following trees.
To add a profiling trees while lowering trees for call, |
Moving forward since I don't see any fix in hand and it's becoming too late to update the 0.19 release. Note the branch for the 0.20 release occurs on March 8. |
Happening regularly on jdk14 nightly builds: For builds of 4th Mar 2020, occurred on For builds of 3rd Mar 2020, occurred on For builds of 2nd Mar 2020, occurred on |
Have kicked off another grinder in case the one above expires before @r30shah has a chance to look at it https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/2482/ Have also kicked one off for |
I have opened up PR with fix (eclipse/omr#4926), waiting for the personal testing to be finished so I can ask to review. |
For the optimization level hot and above, we run two passes in localCSE. First one for the volatiles only and second one for non volatiles only. In case of the volatiles only pass, to make sure we common the volatile which are based on an indirection non volatile finals or non global autos/parm it marks such node available for commoning in volatile pass. Issue occurs, where it encounters first reference of such non volatile nodes and just adds it to the list of replaced nodes without replacing it. This causes an issue with the subsequent reference of such non volatiles which will be replaced with the available expression as node is already in the list of replaced nodes. This commit fixes the issue so that if it encounters finals or autos or parms in volatile pass which are candidate for commoning, it will replace it with available expression and then add it to replaced list. Fixes: eclipse-openj9/openj9#8637 Signed-off-by: Rahil Shah <[email protected]>
@r30shah is there any outlook for getting the OMR change merged? |
I am verifying some of the concerns @andrewcraik had on the PR, currently testing changes (Perf and Functional) to make sure, they are not regressing the performance. I am expecting the testing to be finished with addressing concerns on the changes to be done this week. |
Ok thanks. Note the milestone 2 build is coming up this weekend and we generally don't allow any more change after that. Although we set M2 earlier this time and can evaluate the risk of the change once it's ready. |
For the optimization level hot and above, we run two passes in localCSE. First one for the volatiles only and second one for non volatiles only. In case of the volatiles only pass, to make sure we common the volatile which are based on an indirection non volatile finals or non global autos/parm it marks such node available for commoning in volatile pass. Issue occurs, where it encounters first reference of such non volatile nodes and just adds it to the list of replaced nodes without replacing it. This causes an issue with the subsequent reference of such non volatiles which will be replaced with the available expression as node is already in the list of replaced nodes. This commit fixes the issue so that if it encounters finals or autos or parms in volatile pass which are candidate for commoning, it will replace it with available expression and then add it to replaced list. Fixes: eclipse-openj9/openj9#8637 Signed-off-by: Rahil Shah <[email protected]>
We're past Milestone 2 for the 0.20.0 release, moving this forward to the next release. |
For the optimization level hot and above, we run two passes in localCSE. First one for the volatiles only and second one for non volatiles only. In case of the volatiles only pass, it was commoning the node that are final fields/ gloabal Autos/Parms as well with the assumption that we can rely on that node to not get changed. Issue occurs, where it encounters first reference of such non volatile nodes and just adds it to the list of replaced nodes without replacing it. This causes an issue with the subsequent reference of such non volatiles which will be replaced with the available expression as node is already in the list of replaced nodes. This commit fixes the issue so that it puts node to the replaced list only if commoning is done. Fixes: eclipse-openj9/openj9#8637 Signed-off-by: Rahil Shah <[email protected]>
For the optimization level hot and above, we run two passes in localCSE. First one for the volatiles only and second one for non volatiles only. In case of the volatiles only pass, it was commoning the node that are final fields/ gloabal Autos/Parms as well with the assumption that we can rely on that node to not get changed. Issue occurs, where it encounters first reference of such non volatile nodes and just adds it to the list of replaced nodes without replacing it. This causes an issue with the subsequent reference of such non volatiles which will be replaced with the available expression as node is already in the list of replaced nodes. This commit fixes the issue so that it puts node to the replaced list only if commoning is done. Fixes: eclipse-openj9/openj9#8637 Signed-off-by: Rahil Shah <[email protected]>
Thanks for all the help on this everyone :) |
Failure link
https://github.com/ibmruntimes/openj9-openjdk-jdk14/blob/2baf17d18e36af5750446aff79407efdc3eb97be/test/jdk/java/nio/channels/SocketChannel/CloseDuringConnect.java#L1 fails on JDK14-j9
Optional info
java/nio/channels/SocketChannel/Hangup.java
for x86-64_mac_xljava/nio/channels/SocketChannel/CloseDuringConnect.java
for x86-64_mac_xljava/nio/file/WatchService/UpdateInterference.java
for s390x_linux_xlFailure output (captured from console output)
Dumps: javacores.zip (from
java/nio/channels/SocketChannel/Hangup.java
)The text was updated successfully, but these errors were encountered: