Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal - increase proportion of testing performed in docker containers #1809

Closed
sxa opened this issue Jan 6, 2021 · 11 comments
Closed

Proposal - increase proportion of testing performed in docker containers #1809

sxa opened this issue Jan 6, 2021 · 11 comments
Assignees
Labels
Milestone

Comments

@sxa
Copy link
Member

sxa commented Jan 6, 2021

testdocker.tar.gz (NOTE: Needs hostname and which added to install list on Fedora33)

We are currently losing some of our capacity as a result of #1781. The biggest impact of this is on the Linux/x64 systems where we currently have 12 test systems (4 each of CentOS7, Debian8, and Ubuntu16.04). Losing this capacity will have a significant impact on our abnility to run testing.

In order to mitigate this, I am proposing running a subset (not all, since we need to test some using the real kernel for the operating system) in Docker containers. For prototyping purposes I am using one large system with static docker images of differnet OSs to give us a mix, but going forward we could probably spin them up on the fly in a comparable way to how we build the Linux/x64 and Linux/aarch64 builds using the dockerBuild nodes today.

I have configured a 24-core (48 thread) system docker-packet-ubuntu2004-x64-1 with six static docker containers (One each of Ubuntu 16,04, 18.04, 20.04, 20.10) and two Fedora 33) and these are in jenkins with the name of the host system and a suffix to indicate the OS in the container. While this doesn't quite match the 12-machine set at GoDaddy it will still be adequate to prevent xLinux capacity from being a limiting factor

The important thing about this is that we need to understand how many tests can be performed within docker containers as (external tests excepted) it is not something we have tried before. The containers on this host are restricted to 4 CPUs each and 6Gb of RAM. This limitation prevents the system test suites in particular consuming all the resources on the machine. Initial testing from last night shows that the load on the machine was around the 25-30 mark, which is likely OK.

Issues relating to running tests in docker specifically can be captured in adoptium/aqa-tests#2138

Dockerfiles used to create these images (not the full playbooks are in the attachment

If you have an concerns about this approach, feel free to add them into the comments :-)

@sxa sxa added the question label Jan 6, 2021
@sxa sxa added this to the January 2021 milestone Jan 6, 2021
@aahlenst
Copy link
Contributor

aahlenst commented Jan 6, 2021

I'm curious about the general thinking behind "Let's run test in containers". What problem should be solved by moving the tests into containers? And why did you chose containers as the tool to partition a beefy server?

@sxa
Copy link
Member Author

sxa commented Jan 6, 2021

The problem (and I accept that wasn't clear from the title of this) is lack of xLinux capacity and an immediate need to resolve it, regardless of whether it's the final solution. This was the easiest setup to get going in a way that didn't require large numbers of systems and did not introduce extra complexity in the way things are running via jenkins.

There is no explicit problem being solved by "moving the tests into containers" (Bear in mind I explicitly said in the description that I don't want all testing in containers, but a small amount seems prudent to catch any potential customer scenarios)

Ref the choice of docker I wold suggest that with docker being increasingly popular as a choice of software execution it is prudent to perform some testing in such an environment

@aahlenst
Copy link
Contributor

aahlenst commented Jan 6, 2021

The problem (and I accept that wasn't clear from the title of this) is lack of xLinux capacity and an immediate need to resolve it, regardless of whether it's the final solution.

It was clear to me, but I want to capture the thinking behind it. You don't have to convince me. Historically, we've done a lot of things where it wasn't clear why we did it this way afterwards. Normally, we would have created half a dozen VMs somewhere. Why not this time?

@sxa
Copy link
Member Author

sxa commented Jan 6, 2021

Normally, we would have created half a dozen VMs somewhere. Why not this time?

Lack of sponsored providers with known capacity

@sxa sxa mentioned this issue Jan 6, 2021
22 tasks
@aahlenst
Copy link
Contributor

aahlenst commented Jan 6, 2021

Summary: Because of a lack of sponsors of VMs we have to resort to using bare metal servers. To partition those (tests do not scale linearly with core count and available RAM), we decided to use containers instead of ESX and KVM to have more variation. Would you agree with that, @sxa?

@sxa
Copy link
Member Author

sxa commented Jan 6, 2021

At least as an interim solution, yes other than saying that a large VM could handle the docker images as well, it doesn't explicitly require bare metal unlike other virtualisations. The system tests will scale out with higher core counts, but the others generally do not.

@sxa
Copy link
Member Author

sxa commented Jan 9, 2021

Proposal receieved no objections during the last TSC meeting so I will proceed with getting this "production ready"

CentOS8 images are going to need a little additional work but subject to the concerns listed in adoptium/aqa-tests#2138 this is going well.

Next steps

  • Resolve all test case issues associated with running in docker as per the referenced test issue
  • Increase distribution coverage (CentOS8 as a minimum - default container appears to block remote logins via PAM)
  • Formalise process for setting up a "docker-only" host (`apt update; apt install -y docker.io; useradd -m -d /home/jenkins jenkins; add JRE and ssh key for jenkins on host if required for running explicit docker builds/tests and set crontab to auto-patch) or run playbook with --tags crontab,docker,adoptopenjdk
  • Formalise the creation process for the test images (I'm coming round to the minimal Dockerfile approach as it lets us see if we ever introduce other prereqs that we may otherwise miss, but we could also set up via ansble which would make more sense from a consistency perspective)
  • Continue to work to implement this on other platforms (aarch64 and x64 are now live, pLinux is being tested now)
  • Define a sensible naming convention for the images

I'm open to suggestions on the last one of these ... I've tried a couple of options:

  • Suffix the host system with a shortened version of the container OS e.g. docker-ubuntu2004-ppc64le-1u18 for an Ubuntu 18 machine, but that obviously may lead to confusion about the OS type if it's significant for the test
  • Use a testc- prefix on the name e.g. testc-fedora33-x64-1 which makes it a little harder for admins to tell which host it's on (I anticipate that being in the description field) but is certainly clearer to people running test suites
  • I tried with an sxad provider (could be docker though) so we could have e.g. test-docker-fedora33-aarch64-1 which might be preferable to a testc- prefix as it's clearer about what it is.

Personally I'm coming round to the test-docker- format ...

Slack thread for reference in case it contains more views :-)

@sxa sxa pinned this issue Jan 9, 2021
@sxa sxa self-assigned this Jan 9, 2021
@sxa
Copy link
Member Author

sxa commented Jan 9, 2021

For my own notes. I had a try on ppc64le using containers restricted to 2 core/6Gb and am getting quite a lot of slowdown between that and running on the host p9 system. (2h45-ish in jobs 470/480 vs 53m in 478). Also my standalone Ubuntu 20.04 pLinux machine (Not p9) is running at 1h28 in job 477 .. Need to understand why that took longer than the existing CentOS7 test machines which run JDK11/J9/sanity.openjdk in just under an hour. extended.perf job 3 is running on the standalone machine so will provide another comparison point.

image

@smlambert
Copy link
Contributor

For my 2 cents, it is nice to have test-docker in the name as then its very obvious. I like obvious! (exclamation mark!)

@sxa
Copy link
Member Author

sxa commented Jan 15, 2021

I'm inclined to agree so I'll be renaming things to match this convention (after next week's release to avoid any confusion) in the absence of any major objections showing up in the meantime.

Renames that need to occur to meet that are as follows (First column is the name of the existing docker images):

Old name New name Host system/provider dockerBld?
test-sxad*aarch64 test-docker-*-aarch64-n docker-packet-ubuntu1604-armv8-1 Y
build-packet-ubuntu1804-armv8l-1* test-docker-*-aarch64-n build-packet-ubuntu1804-armv8l-1 N
docker-osuosl-ubuntu2004-ppc64le-1* test-docker-*-ppc64le-n docker-osuosl-ubuntu2004-ppc64le-1 N
testc-linaro-* test-docker-*-aarch64-n docker-linaro-ubuntu2004-aarch64-1 N
testc-packet-* test-docker-*x64-n docker-packet-ubuntu2004-amd-1 Y
test-sxad-*-x64-1 [Decommission] test-ibmcloud-ubuntu1604-x64-1 N

I do also want to try and have some more OSs if possible. So far I have Fedora 33 and Ubuntu 16.04/18.04/20.04 and 20.10 but we should have at least one RHEL derivative too (possibly including CentOS stream or UBI)

@sxa
Copy link
Member Author

sxa commented Feb 2, 2021

Comment for completed moves:

On test-packet-ubuntu1604-armv8-1 (147.75.74.50 - ThunderX):

  • test-sxad-centos8-armv8l-5 -> test-docker-centos8-armv8-1
  • test-sxad-f33-armv8l-6 -> test-docker-fedora33-armv8-1
  • test-sxad-ubuntu1604-armv8l-1 -> test-docker-ubuntu1604-armv8-1
  • test-sxad-ubuntu1804-armv8l-2 -> test-docker-ubuntu1804-armv8-1
  • test-sxad-ubuntu2004-armv8l-3 -> test-docker-ubuntu2004-armv8-1
  • test-sxad-ubuntu2010-armv8l-4 -> test-docker-ubuntu2010-armv8-1

On build-packet-ubuntu1804-armv8-1 (139.178.82.234) D05):

  • build-packet-ubuntu1804-armv8l-1a -> test-docker-ubuntu1804-armv8-2
  • build-packet-ubuntu1804-armv8l-1b -> test-docker-ubuntu1804-armv8-3
  • build-packet-ubuntu1804-armv8l-1c -> test-docker-ubuntu1804-armv8-4
  • build-packet-ubuntu1804-armv8l-1d -> test-docker-ubuntu1804-armv8-5
  • build-packet-ubuntu1804-armv8l-1e -> test-docker-ubuntu1804-armv8-6
  • build-packet-ubuntu1804-armv8l-1f1 -> test-docker-fedora33-armv8-2
  • build-packet-ubuntu1804-armv8l-1f2 -> test-docker-fedora33-armv8-3
  • build-packet-ubuntu1804-armv8l-1f3 -> test-docker-fedora33-armv8-4
  • build-packet-ubuntu1804-armv8l-1f4 -> test-docker-fedora33-armv8-5

On test-packet-ubuntu2004-amd-1 (139.179.85.251 - EPYC):

  • testc-packet-ubuntu1604-amd-1 -> test-docker-ubuntu1604-x64-1
  • testc-packet-ubuntu1804-amd-1 -> test-docker-ubuntu1804-x64-1
  • testc-packet-ubuntu2004-amd-1 -> test-docker-ubuntu2004-x64-1
  • testc-packet-ubuntu2010-amd-1 -> test-docker-ubuntu2010-x64-1
  • testc-packet-fedora33-amd-1 -> test-docker-fedora33-x64-1
  • testc-packet-fedora33-amd-2 -> test-docker-fedora33-x64-2

On test-packet-ubuntu2004-amd-1 (147.75.79.143 - Intel GOLD):

  • testc-packet-ubuntu2010-intel-1 -> test-docker-ubuntu2010-x64-2
  • testc-packet-fedora33-intel-1 -> test-docker-fedora33-x64-3

I've left the osuosl and linaro ones as-is for now as they're not fully up and running yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants