Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add e2e external inspection test #1382

Merged

Conversation

lentzi90
Copy link
Member

@lentzi90 lentzi90 commented Oct 9, 2023

What this PR does / why we need it:

Adds an e2e test for external inspection.
The test simply creates a BMH with annotations for disabling inspection and for adding hardware details. Then it checks that the BMH becomes available without going through inspection.

This also adds cert-manager images to the e2e config so that if they happen to be available locally, then they can be loaded into the kind cluster (if used). It helps save some time when running the test repeatedly.

Minor details:

  • Don't exit on e2e test failure, gather artifacts/logs first
  • Destroy the network before trying to undefine it
  • Wait for namespace deletion between e2e tests since we are using the
    same VM for the BMHs in multiple tests

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1370

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 9, 2023
metal3api "github.com/metal3-io/baremetal-operator/apis/metal3.io/v1alpha1"
)

const hardwareDetails = `{"cpu":{"arch":"x86_64","count":2,"flags":["3dnowprefetch","abm","adx","aes","apic","arat","arch_capabilities","avx","avx2","avx_vnni","bmi1","bmi2","clflush","clflushopt","clwb","cmov","constant_tsc","cpuid","cpuid_fault","cx16","cx8","de","ept","ept_ad","erms","f16c","flexpriority","fma","fpu","fsgsbase","fsrm","fxsr","gfni","hypervisor","ibpb","ibrs","ibrs_enhanced","invpcid","lahf_lm","lm","mca","mce","md_clear","mmx","movbe","movdir64b","movdiri","msr","mtrr","nopl","nx","ospke","pae","pat","pclmulqdq","pdpe1gb","pge","pku","pni","popcnt","pse","pse36","rdpid","rdrand","rdseed","rdtscp","rep_good","sep","serialize","sha_ni","smap","smep","ss","ssbd","sse","sse2","sse4_1","sse4_2","ssse3","stibp","syscall","tpr_shadow","tsc","tsc_adjust","tsc_deadline_timer","tsc_known_freq","umip","vaes","vme","vmx","vnmi","vpclmulqdq","vpid","waitpkg","x2apic","xgetbv1","xsave","xsavec","xsaveopt","xsaves","xtopology"],"model":"12th Gen Intel(R) Core(TM) i9-12900H"},"firmware":{"bios":{"date":"04/01/2014","vendor":"SeaBIOS","version":"1.15.0-1"}},"hostname":"localhost.localdomain","nics":[{"ip":"192.168.222.122","mac":"00:60:2f:31:81:01","model":"0x1af4 0x0001","name":"enp1s0","pxe":true},{"ip":"fe80::570a:edf2:a3a7:4eb8%enp1s0","mac":"00:60:2f:31:81:01","model":"0x1af4 0x0001","name":"enp1s0","pxe":true}],"ramMebibytes":4096,"storage":[{"alternateNames":["/dev/vda","/dev/disk/by-path/pci-0000:04:00.0"],"name":"/dev/disk/by-path/pci-0000:04:00.0","rotational":true,"sizeBytes":21474836480,"type":"HDD","vendor":"0x1af4"}],"systemVendor":{"manufacturer":"QEMU","productName":"Standard PC (Q35 + ICH9, 2009)"}}`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was extracted from a BMH that I created manually, by running:

kubectl get bmh test -o jsonpath="{.status.hardware}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it could be formatted properly still?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

Suggested change
const hardwareDetails = `{"cpu":{"arch":"x86_64","count":2,"flags":["3dnowprefetch","abm","adx","aes","apic","arat","arch_capabilities","avx","avx2","avx_vnni","bmi1","bmi2","clflush","clflushopt","clwb","cmov","constant_tsc","cpuid","cpuid_fault","cx16","cx8","de","ept","ept_ad","erms","f16c","flexpriority","fma","fpu","fsgsbase","fsrm","fxsr","gfni","hypervisor","ibpb","ibrs","ibrs_enhanced","invpcid","lahf_lm","lm","mca","mce","md_clear","mmx","movbe","movdir64b","movdiri","msr","mtrr","nopl","nx","ospke","pae","pat","pclmulqdq","pdpe1gb","pge","pku","pni","popcnt","pse","pse36","rdpid","rdrand","rdseed","rdtscp","rep_good","sep","serialize","sha_ni","smap","smep","ss","ssbd","sse","sse2","sse4_1","sse4_2","ssse3","stibp","syscall","tpr_shadow","tsc","tsc_adjust","tsc_deadline_timer","tsc_known_freq","umip","vaes","vme","vmx","vnmi","vpclmulqdq","vpid","waitpkg","x2apic","xgetbv1","xsave","xsavec","xsaveopt","xsaves","xtopology"],"model":"12th Gen Intel(R) Core(TM) i9-12900H"},"firmware":{"bios":{"date":"04/01/2014","vendor":"SeaBIOS","version":"1.15.0-1"}},"hostname":"localhost.localdomain","nics":[{"ip":"192.168.222.122","mac":"00:60:2f:31:81:01","model":"0x1af4 0x0001","name":"enp1s0","pxe":true},{"ip":"fe80::570a:edf2:a3a7:4eb8%enp1s0","mac":"00:60:2f:31:81:01","model":"0x1af4 0x0001","name":"enp1s0","pxe":true}],"ramMebibytes":4096,"storage":[{"alternateNames":["/dev/vda","/dev/disk/by-path/pci-0000:04:00.0"],"name":"/dev/disk/by-path/pci-0000:04:00.0","rotational":true,"sizeBytes":21474836480,"type":"HDD","vendor":"0x1af4"}],"systemVendor":{"manufacturer":"QEMU","productName":"Standard PC (Q35 + ICH9, 2009)"}}`
const hardwareDetails = `
{
"cpu": {
"arch": "x86_64",
"count": 2,
"flags": [
"3dnowprefetch",
"abm",
"adx",
"aes",
"apic",
"arat",
"arch_capabilities",
"avx",
"avx2",
"avx_vnni",
"bmi1",
"bmi2",
"clflush",
"clflushopt",
"clwb",
"cmov",
"constant_tsc",
"cpuid",
"cpuid_fault",
"cx16",
"cx8",
"de",
"ept",
"ept_ad",
"erms",
"f16c",
"flexpriority",
"fma",
"fpu",
"fsgsbase",
"fsrm",
"fxsr",
"gfni",
"hypervisor",
"ibpb",
"ibrs",
"ibrs_enhanced",
"invpcid",
"lahf_lm",
"lm",
"mca",
"mce",
"md_clear",
"mmx",
"movbe",
"movdir64b",
"movdiri",
"msr",
"mtrr",
"nopl",
"nx",
"ospke",
"pae",
"pat",
"pclmulqdq",
"pdpe1gb",
"pge",
"pku",
"pni",
"popcnt",
"pse",
"pse36",
"rdpid",
"rdrand",
"rdseed",
"rdtscp",
"rep_good",
"sep",
"serialize",
"sha_ni",
"smap",
"smep",
"ss",
"ssbd",
"sse",
"sse2",
"sse4_1",
"sse4_2",
"ssse3",
"stibp",
"syscall",
"tpr_shadow",
"tsc",
"tsc_adjust",
"tsc_deadline_timer",
"tsc_known_freq",
"umip",
"vaes",
"vme",
"vmx",
"vnmi",
"vpclmulqdq",
"vpid",
"waitpkg",
"x2apic",
"xgetbv1",
"xsave",
"xsavec",
"xsaveopt",
"xsaves",
"xtopology"
],
"model": "12th Gen Intel(R) Core(TM) i9-12900H"
},
"firmware": {
"bios": {
"date": "04/01/2014",
"vendor": "SeaBIOS",
"version": "1.15.0-1"
}
},
"hostname": "localhost.localdomain",
"nics": [
{
"ip": "192.168.222.122",
"mac": "00:60:2f:31:81:01",
"model": "0x1af4 0x0001",
"name": "enp1s0",
"pxe": true
},
{
"ip": "fe80::570a:edf2:a3a7:4eb8%enp1s0",
"mac": "00:60:2f:31:81:01",
"model": "0x1af4 0x0001",
"name": "enp1s0",
"pxe": true
}
],
"ramMebibytes": 4096,
"storage": [
{
"alternateNames": [
"/dev/vda",
"/dev/disk/by-path/pci-0000:04:00.0"
],
"name": "/dev/disk/by-path/pci-0000:04:00.0",
"rotational": true,
"sizeBytes": 21474836480,
"type": "HDD",
"vendor": "0x1af4"
}
],
"systemVendor": {
"manufacturer": "QEMU",
"productName": "Standard PC (Q35 + ICH9, 2009)"
}
}
`

😬
I'm not really a fan of this but I can accept it.

Copy link
Member

@tuminoid tuminoid Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does get really long. Could it be put into separate file so it isn't so invasive? It would also make it easier to update if needed.

edit: I see you changed it already. Its fine as is :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean keep it formatted like it is now, or revert to single line?
I'm now wondering also if we should put it in a separate file 🤔
Perhaps that would be the best

@lentzi90
Copy link
Member Author

lentzi90 commented Oct 9, 2023

/metal3-bmo-e2e-test

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-external-inspection branch from 9ceb8d7 to f97254f Compare October 9, 2023 11:13
@@ -128,3 +129,5 @@ require (
replace sigs.k8s.io/cluster-api => sigs.k8s.io/cluster-api v1.5.0

replace github.com/metal3-io/baremetal-operator => ../

replace github.com/metal3-io/baremetal-operator/apis => ../apis
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love input on this. We do the same in the root go.mod so I think this is the correct way.
At the top of the file we require BMO v0.3.1. This old version did not have the annotations that I used in the test. By adding this replacement, we simply use the latest BMO API, not a specific release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct but IMO we should also keep the go.mod up to date as well with latest release versions so as not to confuse the user. We should perhaps do that in all the go modules.

@lentzi90
Copy link
Member Author

lentzi90 commented Oct 9, 2023

/metal3-bmo-e2e-test

@lentzi90
Copy link
Member Author

lentzi90 commented Oct 9, 2023

Ok we got a timeout for BMH going from registering to inspecting. I will increase the timeout for it since it seems to be a flake, and increasing it would avoid triggering the failure.

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-external-inspection branch from f97254f to 9865680 Compare October 9, 2023 11:34
@lentzi90
Copy link
Member Author

lentzi90 commented Oct 9, 2023

/metal3-bmo-e2e-test

@lentzi90
Copy link
Member Author

lentzi90 commented Oct 9, 2023

Hmm same timeout again... now I suspect there must be something wrong and probably depending on what order the tests are run in. I will investigate.
/hold

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2023
@hardys
Copy link
Member

hardys commented Oct 10, 2023

Hmm same timeout again... now I suspect there must be something wrong and probably depending on what order the tests are run in. I will investigate.
/hold

Can we view the BMO logs from the CI run? I took a look at the console output but couldn't see the logs in the Jenkins console

@lentzi90
Copy link
Member Author

Hmm same timeout again... now I suspect there must be something wrong and probably depending on what order the tests are run in. I will investigate.
/hold

Can we view the BMO logs from the CI run? I took a look at the console output but couldn't see the logs in the Jenkins console

Normally there should be a artifacts.tar.gz that can be downloaded to find the logs. For some reason this is not available on this run. I will investigate this also.

What I have found so far is that the issue is probably that we do not wait properly for the previous BMH to be deleted before we start on the next test and create a new. This is causing registration errors since there is already a node with conflicting MAC.

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-external-inspection branch from 9865680 to a7d4f27 Compare October 10, 2023 07:57
@lentzi90
Copy link
Member Author

/metal3-bmo-e2e-test

@kashifest
Copy link
Member

/test-ubuntu-integration-main
/test-centos-e2e-integration-main

@lentzi90
Copy link
Member Author

Lost connection to the jenkins worker...
/metal3-bmo-e2e-test

@kashifest
Copy link
Member

Following tests are unnrelated to the PR , but for testing the trigger itself, sorry for spamming
/test-ubuntu-integration-release-1.5
/test-centos-e2e-integration-release-1.5

@kashifest
Copy link
Member

/test-ubuntu-integration-release-0.4
/test-centos-e2e-integration-release-0.4

@kashifest
Copy link
Member

/test-ubuntu-integration-release-1-5
/test-centos-e2e-integration-release-1-5

@lentzi90
Copy link
Member Author

Artifacts are now properly collected also after test failures 🎉
I will need to adjust the waiting deadlines. Just waiting for @kashifest 's tests to finish before I push

@kashifest
Copy link
Member

Artifacts are now properly collected also after test failures 🎉 I will need to adjust the waiting deadlines. Just waiting for @kashifest 's tests to finish before I push

No need to wait, I have got what I wanted.

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-external-inspection branch from a7d4f27 to 2024ae7 Compare October 10, 2023 09:33
@lentzi90
Copy link
Member Author

/metal3-bmo-e2e-test

@lentzi90
Copy link
Member Author

Successful job!
https://jenkins.nordix.org/job/metal3-bmo-e2e-test-pull/11/
I'm going to trigger it one more time to reduce risk of flakiness.
/metal3-bmo-e2e-test

@lentzi90
Copy link
Member Author

Another success: https://jenkins.nordix.org/job/metal3-bmo-e2e-test-pull/12
Just noticed that I have a typo in a comment... will fix that and push again. With that I think this is good to go 🙂

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-external-inspection branch from 2024ae7 to a0f7f1b Compare October 10, 2023 10:14
@lentzi90
Copy link
Member Author

/metal3-bmo-e2e-test
/test-centos-e2e-integration-main
/test-ubuntu-integration-main

@lentzi90
Copy link
Member Author

Connection issues with the jenkins worker again...
/metal3-bmo-e2e-test

@kashifest
Copy link
Member

/test-centos-e2e-integration-main
/test-ubuntu-integration-main

@kashifest
Copy link
Member

sorry it seems the test was running already, but for some reason my ui view didnt show it.

@lentzi90
Copy link
Member Author

/hold cancel

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2023
Copy link
Member

@mboukhalfa mboukhalfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

test/e2e/config/fixture.yaml Show resolved Hide resolved
@lentzi90
Copy link
Member Author

/cc @kashifest @tuminoid

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Oct 12, 2023
hack/ci-e2e.sh Outdated Show resolved Hide resolved
metal3api "github.com/metal3-io/baremetal-operator/apis/metal3.io/v1alpha1"
)

const hardwareDetails = `{"cpu":{"arch":"x86_64","count":2,"flags":["3dnowprefetch","abm","adx","aes","apic","arat","arch_capabilities","avx","avx2","avx_vnni","bmi1","bmi2","clflush","clflushopt","clwb","cmov","constant_tsc","cpuid","cpuid_fault","cx16","cx8","de","ept","ept_ad","erms","f16c","flexpriority","fma","fpu","fsgsbase","fsrm","fxsr","gfni","hypervisor","ibpb","ibrs","ibrs_enhanced","invpcid","lahf_lm","lm","mca","mce","md_clear","mmx","movbe","movdir64b","movdiri","msr","mtrr","nopl","nx","ospke","pae","pat","pclmulqdq","pdpe1gb","pge","pku","pni","popcnt","pse","pse36","rdpid","rdrand","rdseed","rdtscp","rep_good","sep","serialize","sha_ni","smap","smep","ss","ssbd","sse","sse2","sse4_1","sse4_2","ssse3","stibp","syscall","tpr_shadow","tsc","tsc_adjust","tsc_deadline_timer","tsc_known_freq","umip","vaes","vme","vmx","vnmi","vpclmulqdq","vpid","waitpkg","x2apic","xgetbv1","xsave","xsavec","xsaveopt","xsaves","xtopology"],"model":"12th Gen Intel(R) Core(TM) i9-12900H"},"firmware":{"bios":{"date":"04/01/2014","vendor":"SeaBIOS","version":"1.15.0-1"}},"hostname":"localhost.localdomain","nics":[{"ip":"192.168.222.122","mac":"00:60:2f:31:81:01","model":"0x1af4 0x0001","name":"enp1s0","pxe":true},{"ip":"fe80::570a:edf2:a3a7:4eb8%enp1s0","mac":"00:60:2f:31:81:01","model":"0x1af4 0x0001","name":"enp1s0","pxe":true}],"ramMebibytes":4096,"storage":[{"alternateNames":["/dev/vda","/dev/disk/by-path/pci-0000:04:00.0"],"name":"/dev/disk/by-path/pci-0000:04:00.0","rotational":true,"sizeBytes":21474836480,"type":"HDD","vendor":"0x1af4"}],"systemVendor":{"manufacturer":"QEMU","productName":"Standard PC (Q35 + ICH9, 2009)"}}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it could be formatted properly still?

hack/ci-e2e.sh Show resolved Hide resolved
.gitignore Show resolved Hide resolved
The test simply creates a BMH with annotations for disabling inspection
and for adding hardware details. Then it checks that the BMH becomes
available without going through inspection.

This also adds cert-manager images to the e2e config so that if they
happen to be available locally, then they can be loaded into the kind
cluster (if used). It helps save some time when running the test
repetedly.

Minor details:
- Don't exit on e2e test failure, gather artifacts/logs first
- Destroy the network before trying to undefine it
- Wait for namespace deletion between e2e tests since we are using the
  same VM for the BMHs in multiple tests
@lentzi90 lentzi90 force-pushed the lentzi90/e2e-external-inspection branch from a0f7f1b to 181d95c Compare October 13, 2023 05:40
@metal3-io-bot metal3-io-bot removed the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2023
@lentzi90
Copy link
Member Author

/metal3-bmo-e2e-test

@lentzi90
Copy link
Member Author

/test-centos-e2e-integration-main
/test-ubuntu-integration-main

Copy link
Member

@kashifest kashifest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@@ -128,3 +129,5 @@ require (
replace sigs.k8s.io/cluster-api => sigs.k8s.io/cluster-api v1.5.0

replace github.com/metal3-io/baremetal-operator => ../

replace github.com/metal3-io/baremetal-operator/apis => ../apis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct but IMO we should also keep the go.mod up to date as well with latest release versions so as not to confuse the user. We should perhaps do that in all the go modules.

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 13, 2023
@hardys
Copy link
Member

hardys commented Oct 13, 2023

lgtm

Copy link
Member

@Rozzii Rozzii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@metal3-io-bot
Copy link
Contributor

@Rozzii: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kashifest, Rozzii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@tuminoid tuminoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks again for good work on BMO e2e!

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2023
@metal3-io-bot metal3-io-bot merged commit 7247b42 into metal3-io:main Oct 13, 2023
12 checks passed
@lentzi90 lentzi90 deleted the lentzi90/e2e-external-inspection branch October 13, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add external inspection e2e test
7 participants