WIP: RFC: Rework vmcheck to use STI qcow2 inventory #1362

cgwalters · 2018-05-08T01:10:44Z

While we're having reboot+ansible issues in ostree related
to this, I really like the ability to pass it a qcow2 rather
than the "BYO ssh-config" model. Further, the vmcheck code
was full of workarounds for trying to reuse VMs between tests.

The high level of this code is you can now do locally in development:
export TEST_SUBJECTS=/srv/libvirt/images-gold/Fedora-Atomic-27-20180326.1.x86_64.qcow2
or whatever. Then:
make && make vmcheck TESTS="misc-1 misc-2 layering-relayer"
will spawn those tests, each in a clean VM. A much bigger benefit
is that I reworked the tests to use parallel like the others,
and now if you set VMCHECK_PARALLEL=4 you'll get 4 parallel VMs.
Since each VM just has 512MB of RAM today I have it set to 8 locally.

Requires: https://pagure.io/standard-test-roles/pull-request/188

rh-atomic-bot · 2018-05-08T01:13:16Z

💥 Invalid .papr.yml: failed to parse 1st testsuite: Schema validation failed:

Value '6' is not of type 'str'. Path: '/env/VMCHECK_PARALLEL'..

cgwalters · 2018-05-08T14:20:26Z

Oh right, need to make this work for the C7 build too.

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See coreos#1362

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See coreos#1362

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

jlebon

I definitely like the benefits of one VM per test. And I also like that we're playing to the strength of STR and let it boil down to just an SSH config.

One thing I'm unsure of is getting rid of all the "undo" logic. How do you see local development in this model? E.g. I have my one pet VM that I use for all testing. And the fact that I can make vmcheck TESTS='foobar' and not have to undo things after is nice. I guess we could keep it around but only actually run it if an SSH config was provided?

jlebon · 2018-05-11T13:11:05Z

.papr.yml

-    image: registry.fedoraproject.org/fedora:27
+tests:
+  - cd /etc/yum.repos.d/ && curl -L -O https://copr.fedorainfracloud.org/coprs/walters/oci-kvm-hook/repo/fedora-27/walters-oci-kvm-hook-fedora-27.repo
+  - rpm-ostree install oci-kvm-hook && rpm-ostree ex livefs


If we're passing /dev/kvm anyway, then there's no point in installing oci-kvm-hook, right?

jlebon · 2018-05-11T13:20:13Z

tests/vmcheck/vmcheck-run.sh

+dn=$(cd $(dirname $0) && pwd)
+
+# Preparatory work; we have a helper binary
+make inject-pkglist


This should already have been roped in by the make target, right?

Yeah, but it's convenient to run the script directly too.

jlebon · 2018-05-11T13:23:04Z

tests/vmcheck/vmcheck-run.sh

+# in separating them.
+(for tf in ${tests}; do echo $tf; done) | \
+    parallel -v -j ${VMCHECK_PARALLEL:-1} --progress --halt soon,fail=1 \
+             --results ${LOGDIR} --quote /bin/sh -c "${dn}/run-one-test.sh {} 2>&1" |& tail


Why do we need |& tail here?

jlebon · 2018-05-11T13:30:52Z

tests/common/libvm.sh

+EOF
+        exit 1
+    fi
+    for subj in ${TEST_SUBJECTS}; do


What does it even mean for us if TEST_SUBJECTS includes multiple qcow2s? Should we check for that and error out if so?

Feels like a bit of a "doctor it hurts when I..." situation? I started reading a stackoverflow thing on bash arrays but then my eyes started glazing over...

jlebon · 2018-05-11T13:32:32Z

tests/vmcheck/run-one-test.sh

+    echo "FAILED: ${tf}"
+    vm_cmd 'journalctl --no-pager || true' > ${JOURNAL_LOG} || true
+    if test -z "${TEST_DEBUG:-}" &&
+       test -n "${VMCHECK_TMPD:-}" &&


Is this something set by a human, or supposed to be set in the STI path? I don't see it set anywhere.

jlebon · 2018-05-11T13:34:01Z

.papr.yml


 env:
-  HOSTS: vmcheck1 vmcheck2 vmcheck3
+  # each VM is 1024MB, so this is 3072MB, leaving 1G for the OS


The commit description says 512M, and here it says 1024M. How can we force it to 512M? If we can get away with that, that'd be awesome!

We can't currently, but we should investigate configurability there.

jlebon · 2018-05-11T13:36:54Z

tests/common/libvm.sh


  export VM=${VM:-vmcheck}
-  export SSH_CONFIG=${SSH_CONFIG:-${topsrcdir}/ssh-config}
+  # then use the standard test interface to boot one.


This comment looks out of place.

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

cgwalters · 2018-05-14T15:21:24Z

One thing I'm unsure of is getting rid of all the "undo" logic. How do you see local development in this model? E.g. I have my one pet VM that I use for all testing. And the fact that I can make vmcheck TESTS='foobar' and not have to undo things after is nice. I guess we could keep it around but only actually run it if an SSH config was provided?

Why have the pet though in this model? Download the qcow2 and spawn a fresh VM each time. If you want to debug, set TEST_DEBUG=1 to have the VM persist after.

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

jlebon · 2018-05-14T17:38:23Z

Yeah, I could get used to that. Just contrasting it to my current workflow. I do customize the VM to make debugging easier (e.g. set up some dotfiles and a gdb container), though that stuff can easily be (and should be) streamlined.

cgwalters · 2018-05-14T18:02:47Z

Ah, yeah. Well, one could do the "customize gold image, then save" model. Though for core dumps today I end up extracting them back to my dev container.

rh-atomic-bot · 2018-05-14T19:02:06Z

☔ The latest upstream changes (presumably 38b11d3) made this pull request unmergeable. Please resolve the merge conflicts.

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot · 2018-05-23T15:15:53Z

☔ The latest upstream changes (presumably 592d605) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2018-05-30T15:52:40Z

One immediate downside here is that the singleton contexts are quite slow. It feels like this circles back somewhat to projectatomic/papr#62 - ideally we'd do a pod per vmcheck test.

An interesting thing the https://github.com/openshift/ci-operator does is set up a per-PR kube namespace - this makes it possible to more safely have the per-repository code create pods dynamically as well.

jlebon · 2018-06-04T14:03:25Z

bot, retest this please

cgwalters · 2018-06-06T13:13:38Z

bot, retest this please

cgwalters · 2018-06-06T14:07:11Z

Ugh, the rpm-md repo flakes...

bot, retest this please

cgwalters · 2018-06-06T14:08:11Z

bot, retest this please

jlebon · 2018-06-06T14:51:44Z

bot, retest this please

cgwalters · 2018-06-08T23:25:19Z

bot, retest this please

cgwalters · 2018-06-18T16:38:14Z

OK, so this is blocked by the same perf issue. Locally (4 cores, NVMe), running with VMCHECK_PARALLEL=8:

Mon Jun 18 13:03:44 UTC 2018 overlay: Starting
Mon Jun 18 13:03:56 UTC 2018 overlay: Checkout complete
Mon Jun 18 13:04:33 UTC 2018 overlay: Commit complete

But in CI:

Sat Jun 16 12:59:50 UTC 2018 overlay: Starting
Sat Jun 16 13:01:30 UTC 2018 overlay: Checkout complete

And we don't even get to the commit phase. The checkout is 10x slower. And given doing the commit is ~100s locally, that means we're estimated to be looking at ~16minutes just to prepare the VM; that's pretty nuts. I'm going to need to do some perf investigation - whether this is us doing qemu wrong, unexpected nested virt overhead, etc.

cgwalters · 2018-06-18T19:11:53Z

Ah hah, https://pagure.io/standard-test-roles/pull-request/223

cgwalters · 2018-06-20T14:19:49Z

bot, retest this please

jlebon · 2018-06-20T14:27:18Z

bot, retest this please

cgwalters · 2018-06-20T19:45:30Z

bot, retest this please

cgwalters · 2018-06-20T19:56:29Z

bot, retest this please

cgwalters · 2018-06-23T16:12:31Z

Hm, we definitely have KVM nested, it's like it's still not being accelerated though for some reason.

rh-atomic-bot · 2018-07-11T14:44:35Z

☔ The latest upstream changes (presumably caf66d6) made this pull request unmergeable. Please resolve the merge conflicts.

While we're having reboot+ansible issues in ostree related to this, I really like the ability to pass it a qcow2 rather than the "BYO ssh-config" model. Further, the vmcheck code was full of workarounds for trying to reuse VMs between tests. The high level of this code is you can now do locally in development: `export TEST_SUBJECTS=/srv/libvirt/images-gold/Fedora-Atomic-27-20180326.1.x86_64.qcow2` or whatever. Then: `make && make vmcheck TESTS="misc-1 misc-2 layering-relayer"` will spawn those tests, each in a clean VM. A much bigger benefit is that I reworked the tests to use `parallel` like the others, and now if you set `VMCHECK_PARALLEL=4` you'll get 4 parallel VMs. Since each VM has 1024MB of RAM today I have it set to 8 locally. Requires: https://pagure.io/standard-test-roles/pull-request/188

rh-atomic-bot · 2018-08-20T20:32:10Z

☔ The latest upstream changes (presumably b6d0748) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2020-09-24T15:29:56Z

This is obsoleted by kola.

jlebon added the WIP label May 8, 2018

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request May 8, 2018

ci: Split codestyle checks into separate context

c86f94f

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See coreos#1362

cgwalters mentioned this pull request May 8, 2018

ci: Split codestyle checks into separate context #1364

Closed

rh-atomic-bot pushed a commit that referenced this pull request May 9, 2018

ci: Split codestyle checks into separate context

99bb369

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 9, 2018

ci: Split codestyle checks into separate context

06dcb6c

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018

ci: Split codestyle checks into separate context

503814a

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018

ci: Split codestyle checks into separate context

d968d98

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018

ci: Split codestyle checks into separate context

9782901

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018

ci: Split codestyle checks into separate context

6cb9305

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018

ci: Split codestyle checks into separate context

465b7cb

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018

ci: Split codestyle checks into separate context

f256a9e

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

cgwalters force-pushed the vmcheck-sti branch from df1d808 to aefbdc9 Compare May 11, 2018 04:24

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request May 11, 2018

ci: Split codestyle checks into separate context

687d5ab

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See coreos#1362

rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018

ci: Split codestyle checks into separate context

fff563d

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

jlebon reviewed May 11, 2018

View reviewed changes

rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018

ci: Split codestyle checks into separate context

19567c4

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018

ci: Split codestyle checks into separate context

7204355

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 12, 2018

ci: Split codestyle checks into separate context

31d23ad

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018

ci: Split codestyle checks into separate context

b0b5508

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018

ci: Split codestyle checks into separate context

a68a431

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018

ci: Split codestyle checks into separate context

2e56a0c

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018

ci: Split codestyle checks into separate context

e57504a

Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See #1362 Closes: #1364 Approved by: jlebon

cgwalters force-pushed the vmcheck-sti branch 2 times, most recently from 42301a3 to ffb079a Compare May 15, 2018 18:15

cgwalters force-pushed the vmcheck-sti branch 5 times, most recently from 2009e5f to da334eb Compare May 30, 2018 15:24

cgwalters force-pushed the vmcheck-sti branch 2 times, most recently from d6cae83 to c753617 Compare June 8, 2018 22:13

cgwalters mentioned this pull request Jun 20, 2018

performance issue blocking https://github.com/ostreedev/ostree/pull/1513 projectatomic/papr#93

Open

cgwalters mentioned this pull request Jul 13, 2018

livefs: Require deployment staging #1456

Closed

cgwalters force-pushed the vmcheck-sti branch from f324572 to 40262f5 Compare July 31, 2018 14:10

cgwalters closed this Sep 24, 2020

WIP: RFC: Rework vmcheck to use STI qcow2 inventory #1362

WIP: RFC: Rework vmcheck to use STI qcow2 inventory #1362

Conversation

cgwalters commented May 8, 2018

rh-atomic-bot commented May 8, 2018

cgwalters commented May 8, 2018

jlebon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters commented May 14, 2018

jlebon commented May 14, 2018

cgwalters commented May 14, 2018

rh-atomic-bot commented May 14, 2018

rh-atomic-bot commented May 23, 2018

cgwalters commented May 30, 2018

jlebon commented Jun 4, 2018

cgwalters commented Jun 6, 2018

cgwalters commented Jun 6, 2018

cgwalters commented Jun 6, 2018

jlebon commented Jun 6, 2018

cgwalters commented Jun 8, 2018

cgwalters commented Jun 18, 2018

cgwalters commented Jun 18, 2018

cgwalters commented Jun 20, 2018

jlebon commented Jun 20, 2018

cgwalters commented Jun 20, 2018

cgwalters commented Jun 20, 2018

cgwalters commented Jun 23, 2018

rh-atomic-bot commented Jul 11, 2018

rh-atomic-bot commented Aug 20, 2018

cgwalters commented Sep 24, 2020