Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_init: fix errors when loaded before acquiring IP address #1992

Merged
merged 1 commit into from
Jan 20, 2024

Conversation

francescolavra
Copy link
Member

Attempting to connect to a cloud_init server before an IP address is acquired via DHCP results in a connection error, which causes the cloud_init klib to fail to initialize (and thus the VM to be stopped without executing the user program). This is causing sporadic "program startup failed before exec: (result:connect failed (-4))" errors to occur when running the cloud_init e2e test as part of the Jenkins CI tests.

This change fixes the above issue by adding a check for a suitable IP address configuration before attempting to connect to a server or resolve its host name, and retrying connecting at a later time if this check fails.
The code to retry a connection when the DNS query function returns ERR_INPROGRESS has been removed, and a call to
cloud_download_connect() has been added in the DNS callback function, because otherwise a large delay in the DNS resolution process may cause many DNS requests to be pending at the same time, which can result in a DNS query error (and subsequent klib initialization failure) due to unavailable free request slots. The code that checks for ERR_VAL (which had been added to retry the connection if no IP address has been acquired) has been removed because it no longer works (now the kernel sets up a default DNS server during initialization, therefore the ERR_VAL value is never returned).
The "exec_wait_for_ip4_secs" manifest flag has been removed because it does not serve its intended purpose to delay klib initialization (this flag can only delay execution of the user program).

Attempting to connect to a cloud_init server before an IP address
is acquired via DHCP results in a connection error, which causes
the cloud_init klib to fail to initialize (and thus the VM to be
stopped without executing the user program).
This change fixes the above issue by adding a check for a suitable
IP address configuration before attempting to connect to a server
or resolve its host name, and retrying connecting at a later time
if this check fails.
The code to retry a connection when the DNS query function returns
ERR_INPROGRESS has been removed, and a call to
cloud_download_connect() has been added in the DNS callback
function, because otherwise a large delay in the DNS resolution
process may cause many DNS reuests to be pending at the same time,
which can result in a DNS query error (and subsequent klib
initialization failure) due to unavailable free request slots.
The code that checks for ERR_VAL (which had been added to retry the
connection if no IP address has been acquired) has been removed
because it no longer works (now the kernel sets up a default DNS
server during initialization, therefore the ERR_VAL value is never
returned).
The "exec_wait_for_ip4_secs" manifest flag has been removed because
it does not serve its intended purpose to delay klib initialization
(this flag can only delay execution of the user program).
@francescolavra francescolavra merged commit 74e68e2 into master Jan 20, 2024
5 checks passed
@francescolavra francescolavra deleted the fix/cloud_init branch January 20, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants