Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routes and IP configuration of gw0 removed when networkd restarts on a Node #626

Closed
antoninbas opened this issue Apr 17, 2020 · 2 comments · Fixed by #640
Closed

Routes and IP configuration of gw0 removed when networkd restarts on a Node #626

antoninbas opened this issue Apr 17, 2020 · 2 comments · Fixed by #640
Assignees
Labels
area/component/agent Issues or PRs related to the agent component kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to a documentation. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
Thanks to @alex-vmw for reporting this issue.

When networkd crashes (or when it is restarted manually) which is something we observed on a live cluster, it seems that networkd assumes that it is managing gw0 and it therefore deletes the IP configuration for gw0, along with all the routes associated with gw0. The only solution for this at the moment is to restart the antrea-agent on the Node (or restart the Node altogether).

To Reproduce
On a Node with networkd, restart the service. Observed the gw0 configuration and the routes.

Expected
IP configuration / routes should easer be preserved or re-configured within a reasonnable time frame.

Versions:

===> Version information <===
VERSION: v0.6.0-dev
GIT_SHA: e1d56aa
GIT_TREE_STATE: clean
RELEASE_STATUS: unreleased
DOCKER_IMG_VERSION: v0.6.0-dev-e1d56aa
@antoninbas antoninbas added the bug label Apr 17, 2020
@antoninbas antoninbas self-assigned this Apr 17, 2020
@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/component/agent Issues or PRs related to the agent component labels Apr 17, 2020
@antoninbas
Copy link
Contributor Author

antoninbas commented Apr 18, 2020

The original plan was to have the antrea initContainer write this file to /run/systemd/network/10-antrea.network:

[Match]
Name=gw0

[Link]
Unmanaged=yes

However, there are 2 issues with that approach:

  • The host gateway may not be named gw0. In theory Antrea users can edit the config to choose a different name. This makes it hard to write the appropriate file in the initContainer.
  • The entire /run directory from the host needs to be mounted in the Antrea container as we don't know if /run/systemd exists already and if it does not exist we do not want to create it.

Since this issue is specific to coreOS (see weaveworks/weave#2601), I am tempted to turn this into a "documentation" issue.

@jianjuns what do you think?

@jianjuns
Copy link
Contributor

Yeah, I do not really like to get into network manager business. But I still think we should check and sync routes periodically. I think most other solutions do so.

For your issue #1 I think initContainer can mount the Agent ConfigMap and read the gateway interface there (BTW, I feel better to add an "antrea-" prefix to all interfaces we created like antrea-gw0).

@antoninbas antoninbas added the kind/documentation Categorizes issue or PR as related to a documentation. label Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit that referenced this issue Apr 24, 2020
@antoninbas antoninbas added this to the Antrea v0.6.0 release milestone Apr 28, 2020
McCodeman pushed a commit to McCodeman/antrea that referenced this issue Jun 2, 2020
McCodeman pushed a commit that referenced this issue Jun 2, 2020
GraysonWu pushed a commit to GraysonWu/antrea that referenced this issue Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/component/agent Issues or PRs related to the agent component kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to a documentation. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants