Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] failure during ignition first_boot causes loop, no way to capture logs #454

Closed
ericb-summit opened this issue Jul 29, 2021 · 6 comments · Fixed by flatcar-archive/coreos-overlay#1262

Comments

@ericb-summit
Copy link

ericb-summit commented Jul 29, 2021

Current situation

I am deploying flatcar using terraform, vsphere and ignition. If first_boot fails (for example, due to bad ignition config, or any error during first_boot), errors rapidly scroll on the console. Then it pauses a few seconds, and the scripts appear to retry again, repeatedly, in a loop with no way to see the underlying cause. (I should point out, if I power off + start the VM, it boots fine.)

Impact

It is very difficult to troubleshoot the root cause, as I have no logs (only the virtual console).

Ideal future situation

It would be useful to have some way to tell the boot process not to retry on failure, (i.e. additional grub parameter) or, some other mechanism to abort the boot to access journalctl.

@t-lo
Copy link
Member

t-lo commented Jul 29, 2021

Though options are limited (and we have roadmap items to address this in the future) did you have a look at Mayday reports?

@tormath1
Copy link
Contributor

In case you have access to a large console buffer, you can also edit the kernel command line to provide this options systemd.journald.max_level_console=debug console=ttyS0 - it will increase the ignition verbosity.

In case you miss it, Ignition V3 is in progress (see: #387), you can assert that your configuration is < V3.

Let us know how it goes !

@ericb-summit
Copy link
Author

ericb-summit commented Jul 29, 2021

Hi guys. I've solved my original problem, and for context it was caused by the terraform partition ignition directive instructing ignition to create a new partition at each boot.

However, I didn't solve this because I was able to identify that from the logs. It was pretty much trial and error.

I read about Mayday, but I could never actually get to a shell console to run it.

This is what I mean -- in some cases it seems ignition failures are considered temporary. It keeps retrying forever, no login prompt ever appears, and I have no way of collecting logs other than scrolling back on the virtual console (shift page up). And even that isn't an option, because as soon as new console output appears, the vty scrolls to the end.

Aaaand I fat finger closed the issue, that wasn't intentional. I've run into this many times and probably will again. What is the proper strategy for collecting first boot journalctl logs for a first boot that never completes?

@pothos pothos reopened this Jul 29, 2021
@pothos
Copy link
Member

pothos commented Jul 29, 2021

As you said, it's a problem that Ignition failures result in a loop and the way forward is to do something like failing the boot so that it drops to a dracut rescue shell prompt as with other initramfs boot errors.

@jepio
Copy link
Member

jepio commented Jul 29, 2021

Similar issue was reported here #434

@ericb-summit
Copy link
Author

Yes I had read #434. And in fact, if I specify an ignition file pointing to a remote URI, but provide no pre-boot IP or DHCP, I get the same behaviour I describe here.

Basically, any failure and I'm in Barney with ignition.

So, I take, there's no way to collect logs? Is there some magical param I can pass to grub to bypass first_boot, like, say, init=/bin/sh as I would do with other OS, and then I can scour /var/log for the logs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants