-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstable connection between master and workers according cluster.log in Reliability test #4385
Comments
UpdateIt seems related to wazuh/wazuh#17928 |
All the reliability tests:
are not currently prepared to analyze logs where one of the managers has been restarted. They should be adapted to work correctly when this happens, since it is a usual scenario in the workload benchmarks if the API performance is being tested. For example, the |
It was reported here that some tests did not run in the last time build. We should fix it as part of this issue if the problem keeps happening: |
UpdateI was reading the tests code and giving the first steps to the resolution. |
UpdateI was debugging the test 2023/08/14 19:42:19 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Permission to synchronize granted.
2023/08/14 19:42:19 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Starting.
2023/08/14 19:42:19 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Obtained 29 chunks of data in 0.235s.
2023/08/14 19:42:19 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Sending chunks.
2023/08/14 19:42:20 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] The master closed the connection
2023/08/14 19:43:50 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Permission to synchronize granted.
2023/08/14 19:43:50 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Starting.
2023/08/14 19:43:50 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Obtained 29 chunks of data in 0.232s.
2023/08/14 19:43:52 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Sending chunks.
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] Command received: 'b'dapi''
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Received request: b'master*124633 {"f": {"__callable__": {"__name__": "restart", "__module__": "wazuh.manager", "__qualname__": "restart", "__type__": "function"}}, "f_kwargs": {}, "request_type": "distributed_master", "wait_for_complete": false, "from_cluster": true, "is_async": false, "local_client_arg": null, "basic_services": ["wazuh-modulesd", "wazuh-analysisd", "wazuh-execd", "wazuh-db", "wazuh-remoted"], "rbac_permissions": {"agent:create": {"*:*:*": "allow"}, "group:create": {"*:*:*": "allow"}, "agent:read": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:delete": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:modify_group": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:reconnect": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:restart": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:upgrade": {"agent:id:*": "allow", "agent:group:*": "allow"}, "group:read": {"group:id:*": "allow"}, "group:delete": {"group:id:*": "allow"}, "group:update_config": {"group:id:*": "allow"}, "group:modify_assignments": {"group:id:*": "allow"}, "active-response:command": {"agent:id:*": "allow"}, "security:create": {"*:*:*": "allow"}, "security:create_user": {"*:*:*": "allow"}, "security:read_config": {"*:*:*": "allow"}, "security:update_config": {"*:*:*": "allow"}, "security:revoke": {"*:*:*": "allow"}, "security:edit_run_as": {"*:*:*": "allow"}, "security:read": {"role:id:*": "allow", "policy:id:*": "allow", "user:id:*": "allow", "rule:id:*": "allow"}, "security:update": {"role:id:*": "allow", "policy:id:*": "allow", "user:id:*": "allow", "rule:id:*": "allow"}, "security:delete": {"role:id:*": "allow", "policy:id:*": "allow", "user:id:*": "allow", "rule:id:*": "allow"}, "cluster:status": {"*:*:*": "allow"}, "manager:read": {"*:*:*": "allow"}, "manager:read_api_config": {"*:*:*": "allow"}, "manager:update_config": {"*:*:*": "allow"}, "manager:restart": {"*:*:*": "allow"}, "cluster:read_api_config": {"node:id:*": "allow"}, "cluster:read": {"node:id:*": "allow"}, "cluster:restart": {"node:id:*": "allow"}, "cluster:update_config": {"node:id:*": "allow"}, "ciscat:read": {"agent:id:*": "allow"}, "decoders:read": {"decoder:file:*": "allow"}, "decoders:delete": {"decoder:file:*": "allow"}, "decoders:update": {"*:*:*": "allow"}, "lists:read": {"list:file:*": "allow"}, "lists:delete": {"list:file:*": "allow"}, "lists:update": {"*:*:*": "allow"}, "rootcheck:clear": {"agent:id:*": "allow"}, "rootcheck:read": {"agent:id:*": "allow"}, "rootcheck:run": {"agent:id:*": "allow"}, "rules:read": {"rule:file:*": "allow"}, "rules:delete": {"rule:file:*": "allow"}, "rules:update": {"*:*:*": "allow"}, "mitre:read": {"*:*:*": "allow"}, "sca:read": {"agent:id:*": "allow"}, "syscheck:clear": {"agent:id:*": "allow"}, "syscheck:read": {"agent:id:*": "allow"}, "syscheck:run": {"agent:id:*": "allow"}, "syscollector:read": {"agent:id:*": "allow"}, "logtest:run": {"*:*:*": "allow"}, "task:status": {"*:*:*": "allow"}, "vulnerability:read": {"agent:id:*": "allow"}, "vulnerability:run": {"*:*:*": "allow"}, "rbac_mode": "white"}, "current_user": "", "broadcasting": true, "nodes": ["master", "CLUSTER-Workload_benchmarks_metrics_B283_manager_15", "CLUSTER-Workload_benchmarks_metrics_B283_manager_1", "CLUSTER-Workload_benchmarks_metrics_B283_manager_10", "CLUSTER-Workload_benchmarks_metrics_B283_manager_25", "CLUSTER-Workload_benchmarks_metrics_B283_manager_13", "CLUSTER-Workload_benchmarks_metrics_B283_manager_2", "CLUSTER-Workload_benchmarks_metrics_B283_manager_19", "CLUSTER-Workload_benchmarks_metrics_B283_manager_11", "CLUSTER-Workload_benchmarks_metrics_B283_manager_6", "CLUSTER-Workload_benchmarks_metrics_B283_manager_20", "CLUSTER-Workload_benchmarks_metrics_B283_manager_8", "CLUSTER-Workload_benchmarks_metrics_B283_manager_4", "CLUSTER-Workload_benchmarks_metrics_B283_manager_7", "CLUSTER-Workload_benchmarks_metrics_B283_manager_9", "CLUSTER-Workload_benchmarks_metrics_B283_manager_22", "CLUSTER-Workload_benchmarks_metrics_B283_manager_12", "CLUSTER-Workload_benchmarks_metrics_B283_manager_16", "CLUSTER-Workload_benchmarks_metrics_B283_manager_17", "CLUSTER-Workload_benchmarks_metrics_B283_manager_14", "CLUSTER-Workload_benchmarks_metrics_B283_manager_24", "CLUSTER-Workload_benchmarks_metrics_B283_manager_23", "CLUSTER-Workload_benchmarks_metrics_B283_manager_5", "CLUSTER-Workload_benchmarks_metrics_B283_manager_18", "CLUSTER-Workload_benchmarks_metrics_B283_manager_21", "CLUSTER-Workload_benchmarks_metrics_B283_manager_3"], "api_timeout": 10}'
2023/08/14 19:43:54 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Receiving request: restart from master (124633)
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Receiving parameters {}
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Starting to execute request locally
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Finished executing request locally
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Time calculating request result: 0.024s
2023/08/14 19:43:55 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] Command received: 'b'syn_m_a_e''
2023/08/14 19:43:55 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Finished in 5.318s. Updated 29 chunks.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Permission to synchronize granted.
2023/08/14 19:43:58 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Starting.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Compressing 'files_metadata.json' of 34 files.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Sending zip file to master.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Zip file sent to master.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] Command received: 'b'syn_m_c_ok''
2023/08/14 19:43:58 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Finished in 0.051s. Sync not required.
2023/08/14 19:43:59 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] The master closed the connection
2023/08/14 19:43:59 INFO: [Cluster] [Main] SIGNAL [(15)-(SIGTERM)] received. Exit...
2023/08/14 19:44:18 DEBUG: [Cluster] [Main] Removing '/var/ossec/queue/cluster/'.
2023/08/14 19:44:18 DEBUG: [Cluster] [Main] Removed '/var/ossec/queue/cluster/'.
2023/08/14 19:44:19 INFO: [Local Server] [Main] Serving on /var/ossec/queue/cluster/c-internal.sock
2023/08/14 19:44:19 DEBUG: [Local Server] [Keep alive] Calculating.
2023/08/14 19:44:19 DEBUG: [Local Server] [Keep alive] Calculated. (wqa) ➜ wazuh-qa git:(fix/4385-unstable-realiability-tests) ✗ pytest -s --artifacts_path=/home/nstefani/Downloads/283/ tests/reliability/test_cluster/test_cluster_logs/test_cluster_worker_logs_order/test_cluster_worker_logs_order.py
================================================================================================ test session starts ================================================================================================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-0.13.1
rootdir: /home/nstefani/git/wazuh-qa/tests, configfile: pytest.ini
plugins: html-3.1.1, testinfra-5.0.0, metadata-2.0.4
collected 1 item
tests/reliability/test_cluster/test_cluster_logs/test_cluster_worker_logs_order/test_cluster_worker_logs_order.py .
================================================================================================= 1 passed in 0.05s ================================================================================================= The next step is to generalize this solution for more cases. |
UpdateI've been working on the fix of Still, I need to investigate the behavior of |
UpdateI've been working on the requested changes. |
Description
An error in automated tests is produced by the output generated in the
cluster.log
file. This log file snippet shows that the machine fails to connect to its master node and the same log is present consistently during all the log.Running reliability/test_cluster/test_cluster_logs/test_cluster_connection/test_cluster_connection.py::test_cluster_connection
Steps to reproduce
It can be reproduced by running the Workload benchmark test pipeline and after running manual reliability tests using wazuh-qa tests in 4.6.0.
The pipeline should be run with the same parameters as https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/274/
Report is attached here.
The text was updated successfully, but these errors were encountered: