Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola testiso --scenarios pxe-install is flaky #1597

Closed
jlebon opened this issue Jul 15, 2020 · 1 comment · Fixed by coreos/ignition#1032
Closed

kola testiso --scenarios pxe-install is flaky #1597

jlebon opened this issue Jul 15, 2020 · 1 comment · Fixed by coreos/ignition#1032

Comments

@jlebon
Copy link
Member

jlebon commented Jul 15, 2020

The pipeline is sometimes hitting:

Error: scenario pxe-install: Unexpected string from completion channel: coreos-installer-test-OK expected: live-test-OK
2020-07-15T17:44:49Z cli: scenario pxe-install: Unexpected string from completion channel: coreos-installer-test-OK expected: live-test-OK
qemu-system-x86_64: terminating on signal 15 from pid 5945 ()
qemu-system-x86_64: tpm-emulator: Could not cleanly shutdown the TPM: Invalid argument

I can reproduce this locally sometimes. Offhand it looks like sometimes during the first (live PXE) boot, Ignition isn't getting the complete config but either an empty config or missing the ignition.config.url karg? Hard to tell. But the end result is that the files stage doesn't write down live-signal-ok.service which the test harness relies on.

jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this issue Jul 15, 2020
dustymabe pushed a commit to coreos/fedora-coreos-pipeline that referenced this issue Jul 15, 2020
jlebon added a commit to jlebon/ignition that referenced this issue Jul 16, 2020
Regression from coreos#958. We switched the list of providers from an array to
a map. But iteration order through a map is undefined, so we lost the
precedence of providers.

I think this is the cause behind a lot of the FCOS installer test
timeouts, such as:

coreos/coreos-assembler#1597

There, we pass the Ignition config for the PXE boot via
`ignition.config.url`, but if the metal (no-op) fetcher appears earlier
than the `cmdline` fetcher, we get no config. And similarly for the
installed system when the no-op fetcher appears before the `system`
fetcher (which coreos-installer's `--ignition-file` leverages).

The likelihood of this happening increased in the v2.4.0 release due to
coreos#1002, which only gave us one try
to iterate over the correct provider first (at the `fetch` stage),
rather than every stage having a go at it.

Closes: coreos/coreos-assembler#1597
@jlebon
Copy link
Member Author

jlebon commented Jul 16, 2020

Fix in coreos/ignition#1032.

jlebon added a commit to jlebon/ignition that referenced this issue Jul 16, 2020
Regression from coreos#958. We switched the list of providers from an array to
a map. But iteration order through a map is undefined, so we lost the
precedence of providers.

I think this is the cause behind a lot of the FCOS installer test
timeouts, such as:

coreos/coreos-assembler#1597

There, we pass the Ignition config for the PXE boot via
`ignition.config.url`, but if the metal (no-op) fetcher appears earlier
than the `cmdline` fetcher, we get no config. And similarly for the
installed system when the no-op fetcher appears before the `system`
fetcher (which coreos-installer's `--ignition-file` leverages).

The likelihood of this happening increased in the v2.4.0 release due to
coreos#1002, which only gave us one try
to iterate over the correct provider first (at the `fetch` stage),
rather than every stage having a go at it.

Closes: coreos/coreos-assembler#1597
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant