-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIX test machines at OSUOSL not available #1644
Comments
-2 now back up and running. The ssh keys on it weren't up to date on either of them but that has now been resolved by refreshing. It was trying to connect to |
Ill run a sanity system and openjdk test on both to begin with, I think these would trigger an error if either machine is exhibiting CPAN issues. |
I ran both system and openjdk sanity tests on both machines. The tests were able to run without error. The following test cases, from openjdk sanity, failed on both machines
None of the system tests failed. Where were you notified that CPAN was not working on either machine? |
Im also running an openjdk sanity test on test-osuosl-aix72-ppc64-1 via grinder, incase the CPAN issues occur only via grinder. |
Those three suites failing is a concern = JDK11/J9 sanity.openjdk appears to pass on the other machines so we have something that needs to be fixed: https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk11_j9_sanity.openjdk_ppc64_aix/211 |
@smlambert ref the discussion we had in the team meeting. In terms of machine dependencies and configuration, would you know why |
May be helpful to look at what the test itself does (and if its doing anything special on AIX), if it is behaving well on one machine but not another, can you compare what LIBPATH is on the machines you are trying to compare. (if you search for AIX in the test source, you will see several places where there is AIX specific handling of args and such, starting with: |
@Haroon-Khel Have you looked more into this? Would be good to get these two machines live again if possible. We are restricted on AIX testing capacity. |
The test failure is caused by https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/29d8a1d89c10cfd0cf86075b292bb4be6b196e29/test/jdk/java/lang/ProcessBuilder/Basic.java#L1794, and the 3 lines that follow it. |
Weirdly, the test has just passed on test-osuosl-aix72-ppc64-1. It appears intermittent, as it has just failed again after a subsequent run.
|
Look at ojdk01, ojdk02, (the AIX 7.1 systems) and ojdk03 and ojdk04 (the AIX 7.2 pair): The default perl used on the AIX 7.1 ones is the AIX perl - ancient (5.10) (as this is about CPAN). ojdk03 - does not have all the ssh keys it is suppossed to have - to allow automated login from OSUNIM; ojdk04 - for the 3rd time at least, no longer has either the OSUNIM or my admin authorized keys. IMHO: there are systems outside these systems making unauthorized changes - because my PKI keys keep getting restored, and keep getting removed. What else is being modified? |
OSUNIM key added to the set of machines that you have access to and your key has also been reinstated on 03/04 - hopefully it won't disappear this timeas it was deployed properly through our automation |
I did some digging. This same test failure affected aarch64, eclipse-openj9/openj9#9032. The solution there was to exclude the test case for that platform, https://github.com/AdoptOpenJDK/openjdk-tests/pull/1716/files. This test used to be excluded on aix due to adoptium/aqa-tests#1397 but has since been reincluded, adoptium/aqa-tests#1788, due to an upstream fix. For the sake of re adding the ci.role.test label back to test-osuosl-aix72-ppc64-1 and test-osuosl-aix72-ppc64-2, could this test be excluded for aix? Thoughts @smlambert @sxa |
Yes to re-excluding, but will want someone to chase down the reason we thought the upstream fix would/did fix the issue. |
If ive understood it correctly, I think the upstream fix was for a different issue related to the same test |
@adamfarley, given it was your upstream fix, can you check if the test failure is happening is different than what was fixed via: https://bugs.openjdk.java.net/browse/JDK-8239365 ? |
No, I don't think so. My issue wasn't an OOM, and the bug I fixed wasn't checking against the error class. It was checking against an error message supplied by the OS, derived from an error message "set" that could change depending on what sets you'd installed. If you weren't referring to the OOM, please include a job link, trss link, or a copy of the error output. |
The key for 03/04 has been removed - again. 01/02 is working fine.
Using my desktop I can access 01/02, but not 03/04 - when using the hostname (but can when using IP address??)
No idea what is causing this - but not a warm and cozy feeling. |
My idea now - is that there is - perhaps - an unknown second agent or program that is updating the authorized file. Again - I cannot access ojdk04 - either as myself, nor as the nim admin account - both internal and external IP addresses attempted.
This is getting tiresome. Somewhere there is a bug - and it should not be this host - but I have no clue. When I get access again, I'll try to remember to create an audit record to at least see when the authorized file is being updated. Maybe from that we can locate the source. |
Nothing unkjnown about it - we use Bastillion to manage access. That machine (and 9.28) had duplicate entries in the sytsem so it was updating the keys file twice - once for the full admin user set and another for the AIX set. I've removed the dupicate so it won't happen again. |
On the basis that the problematic tests have been excluded I'm going to re-enable those two test machiens as we have a significant backlog on AIX testing just now. Added FYI @andrew-m-leonard both are now running test jobs starting with these two: |
Seeing as the failing test was excluded, can this issue be closed? |
Yep the machines are running the tests on a regular basis now so this can be closed :-) |
test-osuosl-aix72-ppc64-1 is marked as CPAN allegedly not working on it (Need to verify current issue via Grinder)
test-osuosl-aix72-ppc64-2 is currently offline - raising with OSUOSL.
The text was updated successfully, but these errors were encountered: