-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
print featuregates via antctl #2082
Conversation
2f72305
to
6a5ec02
Compare
76520f8
to
2e28d68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could be a bit misleading because:
- some features are only significant for the Agent and some only for the Controller. For example, if someone has disabled AntreaProxy in the Agent configuration but uses a recent version of Antrea (where AntreaProxy is Beta), running this command from the Controller Pod will say that AntreaProxy is enabled which is incorrect.
- it seems that the command won't run "out-of-cluster", yet this is a useful scenario IMO. If someone wants to quickly & accurately check which features are enabled, they should be able to run it out-of-cluster and get correct information based on the contents of the ConfigMap.
@antoninbas for item 1, I will check and split them into two parts for agent and controller, for item 2, I will move it and make sure it can be run out of cluster, if I am not wrong, I remember only controller support it? and I am a little concern about get the data from configmap directly since I noticed that even I manually change the configmap, it won't reflect to agent/controller until I restart them. I think it can do better to use configmap checksum to make sure agent/controller will be restarted automatically when it's updated. |
For featuregates on the agent side, I feel the only way is to read from ConfigMap if we want to get them remotely. |
760007b
to
b4cddf8
Compare
Hi @jianjuns @antoninbas Could you help to review again? I have refined it to get feature gates info from configmap when it's out-of-cluster, and read from config file directly when it's in pod. thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you will add a controller API and an agent API (for in-Pod execution) to get the accurate featuregate information. But I realized anyway you need to rely on ConfigMap for the remote execution case.
However, one problem with the current approach is that antctl version must match the controller/agent version to have the right default featuregate values. To address this, seems we still need API calls to controller (for remote execution). Not sure it is overkill.
@antoninba ?
I was also expecting that.
I think it's a bit of an issue. One solution would be to add the FeatureGate information (default values, graduation stage) to the ControllerInfo and have antctl consume that (along with the ConfigMap). That is, as an alternative to introducing a new API. |
Returning featuregate default values by ControllerInfo sounds strange to me. |
Then maybe the actual values directly? When I was working on telemetry, I had that code in the Controller. |
@jianjuns ,I was thinking to make the source of feature gates consistent in both in-pod or out-of-cluster way. so remove the original nonReourceURL API. but it's indeed a problem when antctl version is inconsistent with controller/agent. I can add a nonResourceURL to return default feature gates and call it from out-of-cluster. |
No issue on my side with adding an API in If we do it, there is no need to check for version mismatch any more. |
Using the actual values as source of truth makes sense to me. I debugged an issue that it was clear that the issue was because the feature was not enabled according to logs but the configmap shows it's enabled (it must be overwritten after the processes have read them). |
Yes, I was suggesting using aggregation to have it work out of cluster. We can also have antctl connect to the Controller's apiserver directly like we do for some other commands. I guess the less we rely on aggregation, the better. And non-versioned is not an issue as long as we don't evolve the API in the future... |
Command.Long = "Get current Antrea agent feature gates info" | ||
} else if runtime.Mode == runtime.ModeController && runtime.InPod { | ||
Command.RunE = controllerLocalRunE | ||
Command.Long = "Get current Antrea controller feature gates info." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we typically capitalize "Agent" and "Controller", like you did below
if mode == "remote" { | ||
if agentClient, err = getAgentClient(k8sClientset, antreaClientset, restconfigTmpl); err != nil { | ||
return err | ||
} | ||
if controllerClient, err = getControllerClient(k8sClientset, antreaClientset, restconfigTmpl); err != nil { | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not do this in the switch statement above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to reduce the error check through switch case for agent/controller only mode, considering the code consistence, I will add it back to switch case.
agentClient, err = rest.RESTClientFor(restconfigTmpl) | ||
case runtime.ModeController: | ||
controllerClient, err = rest.RESTClientFor(restconfigTmpl) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra empty line?
} | ||
} | ||
|
||
if mode == runtime.ModeAgent || mode == "remote" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be better to test agentClient
for nil
here instead of this if condition. Same below for controllerClient
.
isController := false | ||
if runtime.Mode == runtime.ModeController { | ||
isController = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of an antctl concept. I'd rather have 2 different handlers or add a parameter to this handler function.
if mode == "remote" { | ||
if agentClient, err = getAgentClient(k8sClientset, antreaClientset, restconfigTmpl); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this part. It seems that in getAgentClient
you look for a "random" Node and you select the Agent instance running on that specific Node. I don't think this is great, as the feature gates may not be the same for all agent, if we are in the middle of an upgrade for example. IMO, the best way for "remote" mode would be to only provide the feature gates for the Controller by default, but potentially introduce a new command-line option which would accept a Node name. We would then select the Agent instance running on that specific Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, OK, I thought all agent check the same configmap, so the featuregates should be the same, didn't consider the upgrade case. but it will be more complicate to add a node flag. looks like @jianjuns 's suggestion #2082 (review) is more applicable to simplify the case, what's your thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luolanzone @antoninbas : so I wonder why we need to complicate the featuregates command. Could we just follow other normal commands like "get versions", that:
- When running remotely, antctl connects to controller via K8s API, and the controller can return the features based on the ConfigMap.
- When running inside agent Pod, antctl connects to agent, and agent returns the features based on its internal state.
That works for me. I thought that's what we were going to do from the get go, since 1) is what I had done for telemetry (retrieve the ConfigMap and "merge" it with the "defaults" information). This is the most useful case, although I can see some value in being able to get the "actual" feature gate configuration for an agent (bullet point 2). |
Hi @jianjuns after checking the code, looks like "version" only print controller's version no matter it's in-pod or out of cluster, if we only want to show the controller's feature gates when we issue 'antctl featuregates', we can use internal state from controller pod. just need to have two handlers in both controller and agent package. here are two ways to implement it in my mind:
since you mentioned configmap, so I would like to clarify with you about it, are you preferring the first simple way to show controller feature gates itself? |
Hi @luolanzone : I would go your #2. As @antoninbas said, we want most a command to get all features. |
@luolanzone the out-of-cluster command (for the Controller) should definitely print both sets of feature gates (Agent + Controller), based on the ConfigMap. Some code similar to this: https://github.com/antoninbas/antrea/blob/488da983bf3c8b581a03f4550a409629ab66d6a4/pkg/controller/usagereport/reporter.go#L178-L219 |
6941ff9
to
1866513
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Two minor comments left.
@jianjuns @antoninbas do you want to take a look too since you reviewed it a few times?
pkg/antctl/raw/helper.go
Outdated
kubeconfig.Insecure = true | ||
kubeconfig.CAFile = "" | ||
kubeconfig.CAData = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function may be confusing. There is one SetupKubeconfig
too so looks like that's a secure one and this is insecure one. Can't it caller use SetupKubeconfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just refined it with name SetupBaseKubeconfig, this simple one will be called in proxy command. we haven't had a secure one yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I may misunderstood it, let me check if it's possible to just use SetupKubeconfig for proxy command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after tried it locally, proxy command can work with SetupKubeconfig, so I just removed SetupBaseKubeconfig.
podNameEnvKey = "POD_NAME" | ||
podNamespaceEnvKey = "POD_NAMESPACE" | ||
svcAcctNameEnvKey = "SERVICEACCOUNT_NAME" | ||
antreaConfigMapEnvKey = "ANTREA_CONFIG_MAP_NAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other variables don't use antreaSvcAcctNameEnvKey
and ANTREA_SERVICEACCOUNT_NAME
, maybe keep same here? i.e.:
configMapEnvKey = "CONFIGMAP_NAME"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am following the name used in downstream which is provided by @antoninbas , I guess we should keep them the same for less maintenance effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. No strong opinion.
9ca1c06
to
12743f4
Compare
@antoninbas @jianjuns do you have any new comments for this PR? there was a few new comments from @tnqn recently which have been addressed, please let me know if there is any new comments or Quan can help to move on, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@luolanzone since this is not urgent to be included in v1.1. I will merge it after v1.1 is released to not impact the releasing. Thanks for your patience. |
@luolanzone as Antrea is a CNCF project now, it has adopted DCO on all PRs. The PR is not mergable because of the check. Could you sign the commits, following https://github.com/antrea-io/antrea/blob/main/CONTRIBUTING.md#sign-off-your-work? Meanwhile, you can squash the commits. |
ef03183
to
5a5ee55
Compare
@tnqn sure, I just rebased and squashed the code, updated the commit with sign-off info, please help to check, thanks! |
/test-all |
All e2e tests failed for the antctl case, may due to some code conflict.
|
pkg/antctl/command_list.go
Outdated
if group, ok := groupCommands[cmd.commandGroup]; ok { | ||
currentCommand = append(currentCommand, group.Use) | ||
currentCommand = append(currentCommand, cmd.cobraCommand.Use) | ||
allCommands = append(allCommands, currentCommand) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentCommand
should be declared in this block if it's only used in it.
But to be more generic and reuse more code, it could be similar to L132-L137:
var currentCommand []string
if group, ok := groupCommands[cmd.commandGroup]; ok {
currentCommand = append(currentCommand, group.Use)
}
currentCommand = append(currentCommand, strings.Split(cmd.cobraCommand.Use, " ")[:1])
allCommands = append(allCommands, currentCommand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings.Split(cmd.cobraCommand.Use, " ")[:1]
is a string slice and can't be append to another slice directly, not sure what's purpose to split cmd.cobraCommand.Use
originally, I will check how to reuse code. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like strings.Split(cmd.cobraCommand.Use, " ")[:1]
is only converting cmd.cobraCommand.Use
into a string slice so it can be appended to allCommands
which is two dimensions array, so I change the code just the same as L132-L137 which will do the same thing once there is currentCommand
string slice defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake. It could use strings.Split(cmd.cobraCommand.Use, " ")[0]
. It splitted it because the "use" of other commands includes some arguments.
I guess the current commit will cause supportbundle
command fail.
Signed-off-by: Luo Lan <[email protected]>
/test-all |
/test-networkpolicy |
/skip-e2e It actually succeeded:
|
Signed-off-by: Luo Lan <[email protected]>
allow user to run
antctl get featuregates
inside of agent or controller container or out of cluster to get feature gates information.Resolves #2010
sample cli:
Signed-off-by: Lan Luo [email protected]