-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Marist machines to the self-service provisioning #2673
Comments
I'm going to use this as a conclusive verification of a number of other infrastructure PRs that we have in flight just now, so I won't run the playbooks until after they are merged: |
For future reference before syncing inventories in awx you have to update the project source first in order for awx to have the latest inventory file. I assumed the syncing inventory process automatically pulled the latest inventory file. Running https://awx2.adoptopenjdk.net/#/jobs/playbook/137?job_search=page_size:20;order_by:-finished;not__launch_type:sync on test-marist-rhel8-s390x-2 as a prelim playbook run |
Failed at the installation of I've created a new job in awx which I can use for debugging/testing. It deploys my own branch, https://github.com/Haroon-Khel/openjdk-infrastructure/tree/awx.debug, which so far the only change is |
test-marist-rhel8-s390x-2 is actually a SLES15 machine
And test-marist-sles15-s390x-2 is Rhel 8
|
Failed at downloading Ant
|
Presumably that's only on a subset of the OSs? |
Tried deploying to just the RHEL79 build machines - hit #2700
Failures in Ubuntu 22.04 (Will be gcc-7 - PR ready), Ubuntu 18, the new SLES15 and the old SLES12, and RHEL8. Those will need further investigation. I'm pausing for now so someone else can take over, as it's the build machines I really needed :-) |
This is specific to SLES15. It is installed on the
|
This was preventing them from updating themselves - presumably implemented to bypass a temporary problem at some point - the date stamp on the file was:
I've commented those lines out of both machines now which should avoid this problem:
|
|
@sxa want me to pick up the systemtap-sdt-devel on test-marist-sles15-s390x-2 ? |
Sure - please co-ordinate with Haroon in slack. |
That would be helpful @steelhead31 Thanks |
Ubuntu 22.04 looking happier now that #2691 is merged. |
The sles15 playbooks run better using python 3 as the ansible_python_interpreter ( which can be specified in the inventory ), and also an issue with the ipv6 configuration on test-marist-sles15-s390x-2 has been resolved by disabling ipv6 as shown below.
|
From Marist: "Let me know when fully migrated and I can remove the old servers as we are targeting end of September to power off the old storage servers." |
@Haroon-Khel Looks like there may be some problems that need addressing: https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk11_hs_sanity.openjdk_s390x_linux/651 Certainly a subset of them are in the compression code (we've seen issues there elsewhere - at least on Ubuntu 20.04 - that run was on 22.04) and if all the failures are related to that it will be good to confirm which distributions and versions it happens on, as there will be implications elsewhere. |
Nagios should be working on all of the new marist machines expect for test-marist-rhel8-s390x-2 due to |
Added |
Request for Eclipse to set up two machines for Temurin Compliance: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/1917 |
NOTE: I've brought docker-marist-ubuntu1604-s390x-1 back online in jenkins for now since that one (why not others?) was causing 'temporarily offline in jenkins' messages to appear in the bot channel, but I've switched the We'll need to understand as part of #1716 why the other marist machines which we have disabled (marked offline in jenkins) are not giving the same notifications e.g. https://ci.adoptopenjdk.net/computer/build%2Dmarist%2Drhel77%2Ds390x%2D1/ and https://ci.adoptopenjdk.net/computer/test%2Dmarist%2Dubuntu1804%2Ds390x%2D1/ (and all the other "old" ones) |
Temurin Compliance systems still awaiting setup, but otherwise this is complete. Old machines will need to be deprovisioned, but that is due to be done later. |
@Haroon-Khel @steelhead31 Can we remove the old machines from Nagios, Jenkins and the inventory files please as they have now been deprovisioned. Full list as follows (Some of these were temporary systems so if you can't find them, that's not a problem):
|
Will do, has the ansible inventory been updated with the new ip's / hostnames ?, Im starting work on fixing the discrepancies between nagios and ansible today. |
Yep the new ones have been live for a few weeks: https://github.com/adoptium/infrastructure/pull/2690/files In theory removing the ones listed above should only leave the s390x ones added in that PR. |
All have now been removed from nagios. |
That'll clear up the slack channel a bit then ;-) |
The old machines have all been relieved or their duties and returned to Marist. There is still some more work required to fix some issues that have shown up during this release cycle under #2807 but those can be covered under that issue. The old TCK machines will be decomissioned this week too. |
Removing the following machines from inventory.yml and jenkins as they've been decommissioned
|
To avoid having to go through support for any requests on our Marist systems, they have been trialling a self-service interface for their machines and it is ready to be used as the primary method for provisioning our machines. #2267 has machines which have been provisioned through the new interface and we should start migrating our existing systems across to this too.
The first step will be to ensure we have capacity in the system (At the moment the account I'm using only has 4 machine slots available) and then start duplicating the existing machines in it, followed by decomissioning the existing ones. We will likely look at having at least one
dockerhost
system in order to have a wider range of distributions tested for Linux/s390x (Subject to availability...)Systems ready for installation:
The text was updated successfully, but these errors were encountered: