-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support log level setting from policy #3090
Conversation
This pull request does not have a backport label. Could you fix it @pchila? 🙏
NOTE: |
internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go
Outdated
Show resolved
Hide resolved
internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go
Outdated
Show resolved
Hide resolved
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures> Show only the first 10 test failures
|
This pull request is now in conflicts. Could you fix it? 🙏
|
0275469
to
86c740f
Compare
bfe26f2
to
50f8bf6
Compare
4fd0d9c
to
a045482
Compare
30fde5b
to
d512cab
Compare
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks really good. Easy to follow and very well tested.
Would like to see an integration test, testing all the different modes of operations. Control protocol should expose the currently set log level, so should be easy to validate the log information.
@@ -570,13 +570,16 @@ func (c *Coordinator) PerformComponentDiagnostics(ctx context.Context, additiona | |||
|
|||
// SetLogLevel changes the entire log level for the running Elastic Agent. | |||
// Called from external goroutines. | |||
func (c *Coordinator) SetLogLevel(ctx context.Context, lvl logp.Level) error { | |||
func (c *Coordinator) SetLogLevel(ctx context.Context, lvl *logp.Level) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this to allow a nil value, to then error if a nil value is passed?
You also convert the pointer to value in the function. Seems to me all the changes in this function could be removed and it would have the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly it is to have a single interface for log level setter and then leaving the check to the implementations...
If I didn't change this signature I would have to define 2 interfaces: one with a pointer (for setting the log level coming from policy which could be cleared) and one with the value implemented by coordinator (which needs to receive a real value)
The signal to noise ration of having 2 interfaces with similar names differing only by a pointer seemed a bit much to me but I still wanted to abstract away the coordinator package so this is the tradeoff I came to.
There's no technical reason why we could not have 2 separate interfaces though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that makes complete sense. I was just confused on the change, but if that unifies it I prefer that.
@blakerouse I am currently implementing integration tests using |
You might not need to use |
7bc8dd2
to
b9394d4
Compare
Depending on how the AgentInfo object is created it may not contain the actual log level as it could be empty in the config/state file but it could be set via policy at runtime when we process the policy_change action... I implemented the integration test with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might not need to use
inspect
. Would be easier to use the status output of the control protocol that should include theagent_info
that has the current log level.Depending on how the AgentInfo object is created it may not contain the actual log level as it could be empty in the config/state file but it could be set via policy at runtime when we process the policy_change action... I implemented the integration test with
inspect
and it seems to work pretty well
That is not good. We should ensure that the output from elastic-agent status --output=yaml
includes the current active log level. It should not be obscure to the user, it should be very clear. Even if the log level is not set in the agent info context, the output from the status
command should show the active log level.
Quality Gate passedIssues Measures |
Had a quick look at the // State returns the overall state of the agent.
func (s *Server) State(_ context.Context, _ *cproto.Empty) (*cproto.StateResponse, error) {
state := s.coord.State()
return stateToProto(&state, s.agentInfo)
} and then in the stateToProto() function func stateToProto(state *coordinator.State, agentInfo info.Agent) (*cproto.StateResponse, error) {
// some code omitted here...
return &cproto.StateResponse{
Info: &cproto.StateAgentInfo{
Id: agentInfo.AgentID(),
Version: release.Version(),
Commit: release.Commit(),
BuildTime: release.BuildTime().Format(control.TimeFormat()),
Snapshot: release.Snapshot(),
Pid: int32(os.Getpid()),
Unprivileged: agentInfo.Unprivileged(),
},
State: state.State,
Message: state.Message,
FleetState: state.FleetState,
FleetMessage: state.FleetMessage,
Components: components,
UpgradeDetails: upgradeDetails,
}, nil
} It turns out that the log level is not part of the To have the correct representation if we decide to add the log level to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad! I was thinking it was there in the output because I know some of the log level state is stored in the same interface as the agent information.
Thanks for explaining it.
‼️ Should be reverted if elastic/elastic-agent#4747 does not make 8.15.0. ## Summary Resolves #180778 This PR allows agent log level to be reset back to the level set on its policy (or if not set, simply the default agent level, see elastic/elastic-agent#3090). To achieve this, this PR: - Allows `null` to be passed for the log level settings action, i.e.: ``` POST kbn:/api/fleet/agents/<AGENT_ID>/actions {"action":{"type":"SETTINGS","data":{"log_level": null}}} ``` - Enables the agent policy log level setting implemented in #180607 - Always show `Apply changes` on the agent details > Logs tab - For agents >= 8.15.0, always show `Reset to policy` on the agent details > Logs tab - Ensures both buttons are disabled if user does not have access to write to agents <img width="1254" alt="image" src="https://github.com/elastic/kibana/assets/1965714/bcdf763e-2053-4071-9aa8-8bcb57b8fee1"> <img width="1267" alt="image" src="https://github.com/elastic/kibana/assets/1965714/182ac54d-d5ad-435f-9376-70bb24f288f9"> ### Caveats 1. The reported agent log level is not accurate if agent is using the level from its policy and does not have a log level set on its own level (elastic/elastic-agent#4747), so the initial selection on the agent log level could be wrong 2. We have no way to tell where the log level came from (elastic/elastic-agent#4748), so that's why `Apply changes` and `Reset to policy` are always shown ### Testing Use the latest `8.15.0-SNAPSHOT` for agents or fleet server to test this change ### Checklist Delete any items that are not applicable to this PR. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
What does this PR do?
Support setting elastic-agent log level from fleet policy.
The agent will apply (in decreasing order of priority):
settings
action, if anyWhenever a
policy_change
orsettings
action is received, the settings action handler will reevaluate the loglevels specified and set the log level according to the priority above.Why is it important?
It allows users to manage elastic-agent verbosity easily through the fleet policy, while at the same time allowing to set a different log level to specific agents for troubleshooting issues.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature works./changelog/fragments
using the changelog tool[ ] I have added an integration test or an E2E testAuthor's Checklist
How to test this PR locally
Create a simple policy in fleet and enroll an elastic-agent with that policy.
The start setting log levels in policy and for the specific agent using dev tools and the requests below
Set policy log level
Set log level for a specific agent
A good way to check the effect of the changes of log levels is to keep a terminal with
elastic-agent logs -f
command running: that way we can see the changes in elastic-agent logging in real-time.Another way to check the current log level currently in use by agent is to use the
inspect
subcommand (grepping or usingyq
is recommended as the output is very verbose, for example:sudo elastic-agent inspect | yq .agent
Related issues
Use cases
Screenshots
Logs
Questions to ask yourself