-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: RFC: Rework vmcheck to use STI qcow2 inventory #1362
Conversation
💥 Invalid
|
Oh right, need to make this work for the C7 build too. |
Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See coreos#1362
Prep for reworking the primary test to do vm-in-container, which will temporarily be vm-in-container-in-vm. See coreos#1362
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely like the benefits of one VM per test. And I also like that we're playing to the strength of STR and let it boil down to just an SSH config.
One thing I'm unsure of is getting rid of all the "undo" logic. How do you see local development in this model? E.g. I have my one pet VM that I use for all testing. And the fact that I can make vmcheck TESTS='foobar'
and not have to undo things after is nice. I guess we could keep it around but only actually run it if an SSH config was provided?
.papr.yml
Outdated
image: registry.fedoraproject.org/fedora:27 | ||
tests: | ||
- cd /etc/yum.repos.d/ && curl -L -O https://copr.fedorainfracloud.org/coprs/walters/oci-kvm-hook/repo/fedora-27/walters-oci-kvm-hook-fedora-27.repo | ||
- rpm-ostree install oci-kvm-hook && rpm-ostree ex livefs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're passing /dev/kvm
anyway, then there's no point in installing oci-kvm-hook
, right?
dn=$(cd $(dirname $0) && pwd) | ||
|
||
# Preparatory work; we have a helper binary | ||
make inject-pkglist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already have been roped in by the make target, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it's convenient to run the script directly too.
tests/vmcheck/vmcheck-run.sh
Outdated
# in separating them. | ||
(for tf in ${tests}; do echo $tf; done) | \ | ||
parallel -v -j ${VMCHECK_PARALLEL:-1} --progress --halt soon,fail=1 \ | ||
--results ${LOGDIR} --quote /bin/sh -c "${dn}/run-one-test.sh {} 2>&1" |& tail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need |& tail
here?
EOF | ||
exit 1 | ||
fi | ||
for subj in ${TEST_SUBJECTS}; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it even mean for us if TEST_SUBJECTS
includes multiple qcow2s? Should we check for that and error out if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a bit of a "doctor it hurts when I..." situation? I started reading a stackoverflow thing on bash arrays but then my eyes started glazing over...
tests/vmcheck/run-one-test.sh
Outdated
echo "FAILED: ${tf}" | ||
vm_cmd 'journalctl --no-pager || true' > ${JOURNAL_LOG} || true | ||
if test -z "${TEST_DEBUG:-}" && | ||
test -n "${VMCHECK_TMPD:-}" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something set by a human, or supposed to be set in the STI path? I don't see it set anywhere.
.papr.yml
Outdated
|
||
env: | ||
HOSTS: vmcheck1 vmcheck2 vmcheck3 | ||
# each VM is 1024MB, so this is 3072MB, leaving 1G for the OS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit description says 512M, and here it says 1024M. How can we force it to 512M? If we can get away with that, that'd be awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't currently, but we should investigate configurability there.
tests/common/libvm.sh
Outdated
|
||
export VM=${VM:-vmcheck} | ||
export SSH_CONFIG=${SSH_CONFIG:-${topsrcdir}/ssh-config} | ||
# then use the standard test interface to boot one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks out of place.
Why have the pet though in this model? Download the |
Yeah, I could get used to that. Just contrasting it to my current workflow. I do customize the VM to make debugging easier (e.g. set up some dotfiles and a gdb container), though that stuff can easily be (and should be) streamlined. |
Ah, yeah. Well, one could do the "customize gold image, then save" model. Though for core dumps today I end up extracting them back to my dev container. |
☔ The latest upstream changes (presumably 38b11d3) made this pull request unmergeable. Please resolve the merge conflicts. |
42301a3
to
ffb079a
Compare
☔ The latest upstream changes (presumably 592d605) made this pull request unmergeable. Please resolve the merge conflicts. |
2009e5f
to
da334eb
Compare
One immediate downside here is that the singleton contexts are quite slow. It feels like this circles back somewhat to projectatomic/papr#62 - ideally we'd do a pod per vmcheck test. An interesting thing the https://github.com/openshift/ci-operator does is set up a per-PR kube namespace - this makes it possible to more safely have the per-repository code create pods dynamically as well. |
bot, retest this please |
1 similar comment
bot, retest this please |
Ugh, the rpm-md repo flakes... bot, retest this please |
bot, retest this please |
1 similar comment
bot, retest this please |
d6cae83
to
c753617
Compare
bot, retest this please |
OK, so this is blocked by the same perf issue. Locally (4 cores, NVMe), running with
But in CI:
And we don't even get to the commit phase. The checkout is 10x slower. And given doing the commit is ~100s locally, that means we're estimated to be looking at ~16minutes just to prepare the VM; that's pretty nuts. I'm going to need to do some perf investigation - whether this is us doing qemu wrong, unexpected nested virt overhead, etc. |
bot, retest this please |
3 similar comments
bot, retest this please |
bot, retest this please |
bot, retest this please |
Hm, we definitely have KVM nested, it's like it's still not being accelerated though for some reason. |
☔ The latest upstream changes (presumably caf66d6) made this pull request unmergeable. Please resolve the merge conflicts. |
While we're having reboot+ansible issues in ostree related to this, I really like the ability to pass it a qcow2 rather than the "BYO ssh-config" model. Further, the vmcheck code was full of workarounds for trying to reuse VMs between tests. The high level of this code is you can now do locally in development: `export TEST_SUBJECTS=/srv/libvirt/images-gold/Fedora-Atomic-27-20180326.1.x86_64.qcow2` or whatever. Then: `make && make vmcheck TESTS="misc-1 misc-2 layering-relayer"` will spawn those tests, each in a clean VM. A much bigger benefit is that I reworked the tests to use `parallel` like the others, and now if you set `VMCHECK_PARALLEL=4` you'll get 4 parallel VMs. Since each VM has 1024MB of RAM today I have it set to 8 locally. Requires: https://pagure.io/standard-test-roles/pull-request/188
☔ The latest upstream changes (presumably b6d0748) made this pull request unmergeable. Please resolve the merge conflicts. |
This is obsoleted by kola. |
While we're having reboot+ansible issues in ostree related
to this, I really like the ability to pass it a qcow2 rather
than the "BYO ssh-config" model. Further, the vmcheck code
was full of workarounds for trying to reuse VMs between tests.
The high level of this code is you can now do locally in development:
export TEST_SUBJECTS=/srv/libvirt/images-gold/Fedora-Atomic-27-20180326.1.x86_64.qcow2
or whatever. Then:
make && make vmcheck TESTS="misc-1 misc-2 layering-relayer"
will spawn those tests, each in a clean VM. A much bigger benefit
is that I reworked the tests to use
parallel
like the others,and now if you set
VMCHECK_PARALLEL=4
you'll get 4 parallel VMs.Since each VM just has 512MB of RAM today I have it set to 8 locally.
Requires: https://pagure.io/standard-test-roles/pull-request/188