Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error logs for Agent - Controller connectivity issues #822

Closed
antoninbas opened this issue Jun 10, 2020 · 1 comment · Fixed by #1946
Closed

Better error logs for Agent - Controller connectivity issues #822

antoninbas opened this issue Jun 10, 2020 · 1 comment · Fixed by #1946
Labels
area/component/agent Issues or PRs related to the agent component area/monitoring/logging Issues or PRs related to logging. enhancement New feature or request good first issue Good for newcomers lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@antoninbas
Copy link
Contributor

Describe the problem/challenge you have
One thing that came out of #802 is that Agent logs are not very user-friendly in case of such a connectivity issue. The log messages are as follows:

I0605 03:20:17.197548       1 networkpolicy_controller.go:407] Started watch for AddressGroup
W0605 03:20:17.197574       1 networkpolicy_controller.go:425] Result channel for AddressGroup was closed
I0605 03:20:17.197579       1 networkpolicy_controller.go:411] Stopped watch for AddressGroup, total items received: 0
I0605 03:20:17.197587       1 networkpolicy_controller.go:400] Starting watch for AddressGroup
I0605 03:20:17.203155       1 networkpolicy_controller.go:407] Started watch for NetworkPolicy
W0605 03:20:17.203175       1 networkpolicy_controller.go:425] Result channel for NetworkPolicy was closed
I0605 03:20:17.203180       1 networkpolicy_controller.go:411] Stopped watch for NetworkPolicy, total items received: 0
I0605 03:20:17.203186       1 networkpolicy_controller.go:400] Starting watch for NetworkPolicy
I0605 03:20:17.208324       1 networkpolicy_controller.go:407] Started watch for AppliedToGroup
W0605 03:20:17.208345       1 networkpolicy_controller.go:425] Result channel for AppliedToGroup was closed
I0605 03:20:17.208350       1 networkpolicy_controller.go:411] Stopped watch for AppliedToGroup, total items received: 0
I0605 03:20:17.208358       1 networkpolicy_controller.go:400] Starting watch for AppliedToGroup

And they are repeated every 30 seconds.

Describe the solution you'd like
@tnqn do you think it would make sense to have more user-friendly error logs, indicating that there is a connectivity issue to the Antrea service?

Additionally, one thing I would like to consider is improving the readiness probe for the Agent so that it reflects the status of the connection to the Antrea controller. Without this, there is no feedback about this issue until the user looks at the agent logs.

@antoninbas antoninbas added enhancement New feature or request area/component/agent Issues or PRs related to the agent component area/monitoring/logging Issues or PRs related to logging. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 10, 2020
@antoninbas antoninbas added the good first issue Good for newcomers label Jun 30, 2020
@github-actions
Copy link
Contributor

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2020
hty690 added a commit to hty690/antrea that referenced this issue Mar 11, 2021
hty690 added a commit to hty690/antrea that referenced this issue Mar 11, 2021
hty690 added a commit to hty690/antrea that referenced this issue Mar 11, 2021
The agent logs will print the failure message when the connection to
the controller is timeout. Also adding HealthCheck to remind
the users of the connectivity issue.

Fixes antrea-io#822
hty690 added a commit to hty690/antrea that referenced this issue Mar 16, 2021
The agent logs will print the failure message when the connection to
the controller is timeout. Also adding HealthCheck to remind
the users of the connectivity issue.

Fixes antrea-io#822
hty690 added a commit to hty690/antrea that referenced this issue Mar 16, 2021
The agent logs will print the failure message when the connection to
the controller is timeout. Also adding readiness probe to remind
the users of the connectivity issue.

Fixes antrea-io#822
hty690 added a commit to hty690/antrea that referenced this issue Mar 16, 2021
The agent logs will print the failure message when the connection to
the controller is timeout. Also adding readiness probe to remind
the users of the connectivity issue.

Fixes antrea-io#822
@antoninbas antoninbas added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 17, 2021
tnqn pushed a commit that referenced this issue Mar 18, 2021
The agent logs will print the failure message when the connection to
the controller is timeout. Also adding readiness probe to remind
the users of the connectivity issue.

Fixes #822
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/component/agent Issues or PRs related to the agent component area/monitoring/logging Issues or PRs related to logging. enhancement New feature or request good first issue Good for newcomers lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant