Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support log level setting from policy #3090

Merged
merged 17 commits into from
May 14, 2024

Conversation

pchila
Copy link
Member

@pchila pchila commented Jul 17, 2023

What does this PR do?

Support setting elastic-agent log level from fleet policy.
The agent will apply (in decreasing order of priority):

  1. log level set specifically to the agent via settings action, if any
  2. log level specified in fleet policy, if any
  3. default hard-coded log level for elastic-agent

Whenever a policy_change or settings action is received, the settings action handler will reevaluate the loglevels specified and set the log level according to the priority above.

Why is it important?

It allows users to manage elastic-agent verbosity easily through the fleet policy, while at the same time allowing to set a different log level to specific agents for troubleshooting issues.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Author's Checklist

How to test this PR locally

Create a simple policy in fleet and enroll an elastic-agent with that policy.
The start setting log levels in policy and for the specific agent using dev tools and the requests below

Set policy log level

PUT kbn:/api/fleet/agent_policies/<policy id>
{
   "name": "<policy name>",
   "namespace": "default",
   "overrides": {
       "agent":{
         "logging": {
           "level": "error"
         }
       }
   }
}

Set log level for a specific agent

POST kbn:/api/fleet/agents/<elastic agent id>/actions
{
  "action": {
    "type": "SETTINGS",
    "data": {
     "log_level": "debug"   
    }
  }
}

A good way to check the effect of the changes of log levels is to keep a terminal with elastic-agent logs -f command running: that way we can see the changes in elastic-agent logging in real-time.

Another way to check the current log level currently in use by agent is to use the inspect subcommand (grepping or using yq is recommended as the output is very verbose, for example:

sudo elastic-agent inspect | yq .agent

download:
  sourceURI: https://artifacts.elastic.co/downloads/
features: null
headers: null
id: 25bd1b94-9b76-4a2d-bdd8-09d778e8cf44
logging:
  level: warning
monitoring:
  enabled: true
  http:
    buffer: null
    enabled: false
    host: localhost
    port: 6791
  logs: true
  metrics: true
  namespace: default
  use_output: default
protection:
  enabled: false
  signing_key: <redacted>

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@mergify mergify bot assigned pchila Jul 17, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 17, 2023

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-17T12:43:18.470+0000

  • Duration: 24 min 5 sec

Test stats 🧪

Test Results
Failed 20
Passed 5955
Skipped 27
Total 6002

Test errors 20

Expand to view the tests failures

> Show only the first 10 test failures

Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host
        handler_action_policy_change_test.go:273: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:273
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host (0.01s)
     
    

Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts
        handler_action_policy_change_test.go:317: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:317
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts (0.00s)
     
    

Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client
        handler_action_policy_change_test.go:366: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:366
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client (0.00s)
     
    

Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client
        handler_action_policy_change_test.go:422: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:422
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client (0.00s)
     
    

Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts (0.06s)
     
    

Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host
        handler_action_policy_change_test.go:273: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:273
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host (0.00s)
     
    

Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts
        handler_action_policy_change_test.go:317: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:317
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts (0.00s)
     
    

Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client
        handler_action_policy_change_test.go:366: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:366
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client (0.00s)
     
    

Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client
        handler_action_policy_change_test.go:422: 
            	Error Trace:	/var/lib/jenkins/workspace/_agent_elastic-agent-mbp_PR-3090/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go:422
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client (0.00s)
     
    

Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestPolicyChangeHandler_handleFleetServerHosts
    --- FAIL: TestPolicyChangeHandler_handleFleetServerHosts (0.02s)
     
    

Steps errors 8

Expand to view the steps failures

Go unitTest
  • Took 5 min 37 sec . View more details here
  • Description: mage unitTest
Publish Cobertura Coverage Report
  • Took 0 min 1 sec . View more details here
  • Description: [2023-07-17T12:58:53.486Z] [Cobertura] Publishing Cobertura coverage report... [2023-07-17T12:58:53
Go unitTest
  • Took 4 min 14 sec . View more details here
  • Description: mage unitTest
Checks if running on a Unix-like node
  • Took 0 min 0 sec . View more details here
  • Description: script returned exit code 1
Go unitTest
  • Took 5 min 20 sec . View more details here
  • Description: mage unitTest
Checks if running on a Unix-like node
  • Took 0 min 0 sec . View more details here
  • Description: script returned exit code 1
Go unitTest
  • Took 5 min 50 sec . View more details here
  • Description: mage unitTest
Checks if running on a Unix-like node
  • Took 0 min 0 sec . View more details here
  • Description: script returned exit code 1

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors 20

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'ubuntu-22 && immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'aws && aarch64 && gobld/diskSizeGb:200' / Test / TestPolicyChangeHandler_handleFleetServerHosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2022 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2022 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2022 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2022 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2022 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2016 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_a_new_Host_and_no_proxy_changes_the_Host – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2016 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_new_Hosts_and_no_proxy_changes_the_Hosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2016 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_proxy_changes_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2016 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts/A_policy_with_empty_proxy_don't_change_the_fleet_client – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers
  • Name: Test / Matrix - PLATFORM = 'windows-2016 && windows-immutable' / Test / TestPolicyChangeHandler_handleFleetServerHosts – github.com/elastic/elastic-agent/internal/pkg/agent/application/actions/handlers

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

mergify bot commented Feb 12, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b support_loglvl_from_policy upstream/support_loglvl_from_policy
git merge upstream/main
git push upstream support_loglvl_from_policy

@pchila pchila force-pushed the support_loglvl_from_policy branch from 0275469 to 86c740f Compare April 18, 2024 09:52
@pchila pchila force-pushed the support_loglvl_from_policy branch 2 times, most recently from bfe26f2 to 50f8bf6 Compare May 8, 2024 09:33
@pchila pchila closed this May 8, 2024
@pchila pchila force-pushed the support_loglvl_from_policy branch from 4fd0d9c to a045482 Compare May 8, 2024 16:56
@pchila pchila reopened this May 8, 2024
@pchila pchila force-pushed the support_loglvl_from_policy branch from 30fde5b to d512cab Compare May 9, 2024 09:23
@pchila pchila changed the title [DRAFT] - support log level setting from policy Support log level setting from policy May 9, 2024
@pchila pchila marked this pull request as ready for review May 9, 2024 13:11
@pchila pchila requested a review from a team as a code owner May 9, 2024 13:11
@ycombinator ycombinator requested review from blakerouse and removed request for andrzej-stencel May 9, 2024 21:50
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really good. Easy to follow and very well tested.

Would like to see an integration test, testing all the different modes of operations. Control protocol should expose the currently set log level, so should be easy to validate the log information.

@@ -570,13 +570,16 @@ func (c *Coordinator) PerformComponentDiagnostics(ctx context.Context, additiona

// SetLogLevel changes the entire log level for the running Elastic Agent.
// Called from external goroutines.
func (c *Coordinator) SetLogLevel(ctx context.Context, lvl logp.Level) error {
func (c *Coordinator) SetLogLevel(ctx context.Context, lvl *logp.Level) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this to allow a nil value, to then error if a nil value is passed?

You also convert the pointer to value in the function. Seems to me all the changes in this function could be removed and it would have the same result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly it is to have a single interface for log level setter and then leaving the check to the implementations...
If I didn't change this signature I would have to define 2 interfaces: one with a pointer (for setting the log level coming from policy which could be cleared) and one with the value implemented by coordinator (which needs to receive a real value)

The signal to noise ration of having 2 interfaces with similar names differing only by a pointer seemed a bit much to me but I still wanted to abstract away the coordinator package so this is the tradeoff I came to.

There's no technical reason why we could not have 2 separate interfaces though...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that makes complete sense. I was just confused on the change, but if that unifies it I prefer that.

@pchila
Copy link
Member Author

pchila commented May 10, 2024

Would like to see an integration test, testing all the different modes of operations. Control protocol should expose the currently set log level, so should be easy to validate the log information.

@blakerouse I am currently implementing integration tests using inspect to check that the log value is correctly set.
Hopefully I will be done by the end of the day :)

@blakerouse
Copy link
Contributor

Would like to see an integration test, testing all the different modes of operations. Control protocol should expose the currently set log level, so should be easy to validate the log information.

@blakerouse I am currently implementing integration tests using inspect to check that the log value is correctly set. Hopefully I will be done by the end of the day :)

You might not need to use inspect. Would be easier to use the status output of the control protocol that should include the agent_info that has the current log level.

@pchila pchila force-pushed the support_loglvl_from_policy branch from 7bc8dd2 to b9394d4 Compare May 13, 2024 13:28
@pchila
Copy link
Member Author

pchila commented May 13, 2024

You might not need to use inspect. Would be easier to use the status output of the control protocol that should include the agent_info that has the current log level.

Depending on how the AgentInfo object is created it may not contain the actual log level as it could be empty in the config/state file but it could be set via policy at runtime when we process the policy_change action... I implemented the integration test with inspect and it seems to work pretty well

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not need to use inspect. Would be easier to use the status output of the control protocol that should include the agent_info that has the current log level.

Depending on how the AgentInfo object is created it may not contain the actual log level as it could be empty in the config/state file but it could be set via policy at runtime when we process the policy_change action... I implemented the integration test with inspect and it seems to work pretty well

That is not good. We should ensure that the output from elastic-agent status --output=yaml includes the current active log level. It should not be obscure to the user, it should be very clear. Even if the log level is not set in the agent info context, the output from the status command should show the active log level.

@pchila
Copy link
Member Author

pchila commented May 13, 2024

the integration test with inspect and it seems to work pretty well

That is not good. We should ensure that the output from elastic-agent status --output=yaml includes the current active log level. It should not be obscure to the user, it should be very clear. Even if the log level is not set in the agent info context, the output from the status command should show the active log level.

@blakerouse

Had a quick look at the elastic-agent status command output and I didn't see any information about log level, so I dug deeper into the State() function in our GRPC server

// State returns the overall state of the agent.
func (s *Server) State(_ context.Context, _ *cproto.Empty) (*cproto.StateResponse, error) {
	state := s.coord.State()
	return stateToProto(&state, s.agentInfo)
}

and then in the stateToProto() function

func stateToProto(state *coordinator.State, agentInfo info.Agent) (*cproto.StateResponse, error) {

// some code omitted here...


	return &cproto.StateResponse{
		Info: &cproto.StateAgentInfo{
			Id:           agentInfo.AgentID(),
			Version:      release.Version(),
			Commit:       release.Commit(),
			BuildTime:    release.BuildTime().Format(control.TimeFormat()),
			Snapshot:     release.Snapshot(),
			Pid:          int32(os.Getpid()),
			Unprivileged: agentInfo.Unprivileged(),
		},
		State:          state.State,
		Message:        state.Message,
		FleetState:     state.FleetState,
		FleetMessage:   state.FleetMessage,
		Components:     components,
		UpgradeDetails: upgradeDetails,
	}, nil
}

It turns out that the log level is not part of the cproto.StateAgentInfo so it's not output to the user at this point.

To have the correct representation if we decide to add the log level to the elastic-agent status output, we need to take the log level from the coordinator.State rather than info.AgentInfo struct (we pass both structs in ), as the coordinator always have the applied log level wherever it is coming from (policy or agent settings).

@pchila pchila requested a review from blakerouse May 13, 2024 15:43
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! I was thinking it was there in the output because I know some of the log level state is stored in the same interface as the agent information.

Thanks for explaining it.

@pchila pchila merged commit 209aff4 into elastic:main May 14, 2024
9 checks passed
jen-huang added a commit to elastic/kibana that referenced this pull request May 16, 2024
‼️ Should be reverted if
elastic/elastic-agent#4747 does not make
8.15.0.

## Summary

Resolves #180778 

This PR allows agent log level to be reset back to the level set on its
policy (or if not set, simply the default agent level, see
elastic/elastic-agent#3090).

To achieve this, this PR:
- Allows `null` to be passed for the log level settings action, i.e.:

```
POST kbn:/api/fleet/agents/<AGENT_ID>/actions
{"action":{"type":"SETTINGS","data":{"log_level": null}}}
```
- Enables the agent policy log level setting implemented in
#180607
- Always show `Apply changes` on the agent details > Logs tab
- For agents >= 8.15.0, always show `Reset to policy` on the agent
details > Logs tab
- Ensures both buttons are disabled if user does not have access to
write to agents

<img width="1254" alt="image"
src="https://github.com/elastic/kibana/assets/1965714/bcdf763e-2053-4071-9aa8-8bcb57b8fee1">

<img width="1267" alt="image"
src="https://github.com/elastic/kibana/assets/1965714/182ac54d-d5ad-435f-9376-70bb24f288f9">

### Caveats
1. The reported agent log level is not accurate if agent is using the
level from its policy and does not have a log level set on its own level
(elastic/elastic-agent#4747), so the initial
selection on the agent log level could be wrong
2. We have no way to tell where the log level came from
(elastic/elastic-agent#4748), so that's why
`Apply changes` and `Reset to policy` are always shown

### Testing
Use the latest `8.15.0-SNAPSHOT` for agents or fleet server to test this
change

### Checklist

Delete any items that are not applicable to this PR.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configuring agent.logging.level through agent policy do not work for managed agents
4 participants