cloud_init: fix errors when loaded before acquiring IP address #1992
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Attempting to connect to a cloud_init server before an IP address is acquired via DHCP results in a connection error, which causes the cloud_init klib to fail to initialize (and thus the VM to be stopped without executing the user program). This is causing sporadic "program startup failed before exec: (result:connect failed (-4))" errors to occur when running the cloud_init e2e test as part of the Jenkins CI tests.
This change fixes the above issue by adding a check for a suitable IP address configuration before attempting to connect to a server or resolve its host name, and retrying connecting at a later time if this check fails.
The code to retry a connection when the DNS query function returns ERR_INPROGRESS has been removed, and a call to
cloud_download_connect() has been added in the DNS callback function, because otherwise a large delay in the DNS resolution process may cause many DNS requests to be pending at the same time, which can result in a DNS query error (and subsequent klib initialization failure) due to unavailable free request slots. The code that checks for ERR_VAL (which had been added to retry the connection if no IP address has been acquired) has been removed because it no longer works (now the kernel sets up a default DNS server during initialization, therefore the ERR_VAL value is never returned).
The "exec_wait_for_ip4_secs" manifest flag has been removed because it does not serve its intended purpose to delay klib initialization (this flag can only delay execution of the user program).