Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install fails if enroll fails and surface errors #3554

Merged
merged 5 commits into from
Oct 17, 2023

Conversation

AndersonQ
Copy link
Member

For the agent to be actually enrolled it needs to restart after the enroll process is completed, so it'll pickup the new config and "connect" to fleet-server.

This change makes the enroll command to fail if it cannot restart the agent after enrolling on fleet

What does this PR do?

This change makes the enroll command to fail if it cannot restart the agent after enrolling on fleet

Why is it important?

For the agent to be actually enrolled it needs to restart after the enroll process is completed, so it'll pickup the new config and "connect" to fleet-server.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Related issues

  • N/A

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

* surface errors that might occur during enroll
* fail install command if agent cannot be restarted
* do not print success message if there was an enroll error. Print an error message and the error instead
* add logs to show the different enroll attempts
* add more context t errors
* refactor internal/pkg/agent/install/perms_unix.go and add more context to errors
restore main version
* ignore agent restart error on enroll tests as there is no agent to be restarted
* daemonReloadWithBackoff does not retry on context deadline exceeded
@AndersonQ AndersonQ self-assigned this Oct 6, 2023
@mergify
Copy link
Contributor

mergify bot commented Oct 6, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fail-enroll upstream/fail-enroll
git merge upstream/main
git push upstream fail-enroll

@mergify
Copy link
Contributor

mergify bot commented Oct 6, 2023

This pull request does not have a backport label. Could you fix it @AndersonQ? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Oct 6, 2023
@AndersonQ AndersonQ added bug Something isn't working Team:Elastic-Agent Label for the Agent team backport-v8.11.0 Automated backport with mergify and removed backport-skip labels Oct 6, 2023
@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 6, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-16T14:03:07.013+0000

  • Duration: 26 min 58 sec

Test stats 🧪

Test Results
Failed 0
Passed 6493
Skipped 59
Total 6552

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@AndersonQ
Copy link
Member Author

/test

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 9, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 67.559% (202/299) 👍
Classes 66.187% (368/556) 👍
Methods 53.585% (1166/2176) 👍 0.046
Lines 39.769% (13586/34162) 👍 0.059
Conditionals 100.0% (0/0) 💚

@AndersonQ AndersonQ marked this pull request as ready for review October 11, 2023 12:02
@AndersonQ AndersonQ requested a review from a team as a code owner October 11, 2023 12:02
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the usual backoff awkwardness (not the point of the PR) there are just a couple of small nitpicks.
I agree with @ycombinator that, since we squash commits it would be better to separate small typo corrections or refactors in a dedicated PR

internal/pkg/agent/cmd/enroll_cmd.go Show resolved Hide resolved
Comment on lines 169 to 176
if err != nil &&
// There is no agent running, therefore nothing to be restarted.
// However, this will cause the Enroll command to return an error
// which we'll ignore here.
!strings.Contains(err.Error(),
"could not reload agent daemon, unable to trigger restart") {
t.Fatalf("enrrol coms returned and unexpected error: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description it seems that we expect the error about not being able to trigger a restart, we can simplify with:

Suggested change
if err != nil &&
// There is no agent running, therefore nothing to be restarted.
// However, this will cause the Enroll command to return an error
// which we'll ignore here.
!strings.Contains(err.Error(),
"could not reload agent daemon, unable to trigger restart") {
t.Fatalf("enrrol coms returned and unexpected error: %v", err)
}
// There is no agent running, therefore nothing to be restarted.
// However, this will cause the Enroll command to return an error
// which we'll ignore here.
require.ErrorContainsf(t, err, "could not reload agent daemon, unable to trigger restart", "enroll command returned an unexpected error: %v", err)

@elastic-sonarqube
Copy link

@AndersonQ AndersonQ merged commit f7e558f into elastic:main Oct 17, 2023
21 checks passed
@AndersonQ AndersonQ deleted the fail-enroll branch October 17, 2023 10:24
mergify bot pushed a commit that referenced this pull request Oct 17, 2023
* fix install/enroll cmd not failing when agent restart fails
* surface errors that might occur during enroll
* fail install command if agent cannot be restarted
* do not print success message if there was an enroll error. Print an error message and the error instead
* add logs to show the different enroll attempts
* add more context t errors
* refactor internal/pkg/agent/install/perms_unix.go and add more context to errors
restore main version
* ignore agent restart error on enroll tests as there is no agent to be restarted
* daemonReloadWithBackoff does not retry on context deadline exceeded and context cancelled
* fix typos

(cherry picked from commit f7e558f)
pierrehilbert pushed a commit that referenced this pull request Oct 17, 2023
* fix install/enroll cmd not failing when agent restart fails
* surface errors that might occur during enroll
* fail install command if agent cannot be restarted
* do not print success message if there was an enroll error. Print an error message and the error instead
* add logs to show the different enroll attempts
* add more context t errors
* refactor internal/pkg/agent/install/perms_unix.go and add more context to errors
restore main version
* ignore agent restart error on enroll tests as there is no agent to be restarted
* daemonReloadWithBackoff does not retry on context deadline exceeded and context cancelled
* fix typos

(cherry picked from commit f7e558f)

Co-authored-by: Anderson Queiroz <[email protected]>
AndersonQ added a commit that referenced this pull request Oct 18, 2023
AndersonQ added a commit that referenced this pull request Oct 18, 2023
cmacknz pushed a commit that referenced this pull request Oct 18, 2023
cmacknz pushed a commit that referenced this pull request Oct 18, 2023
AndersonQ added a commit to AndersonQ/elastic-agent that referenced this pull request Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.11.0 Automated backport with mergify bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants