Unstable connection between master and workers according cluster.log in Reliability test #4385

pro-akim · 2023-08-02T15:33:11Z

Wazuh version	Component
4.6.0	Wazuh server cluster

Description

An error in automated tests is produced by the output generated in the cluster.log file. This log file snippet shows that the machine fails to connect to its master node and the same log is present consistently during all the log.

Running reliability/test_cluster/test_cluster_logs/test_cluster_connection/test_cluster_connection.py::test_cluster_connection

  if disconnected_nodes:
>           pytest.fail(f'The following nodes disconnected from master at any point:\n- ' + '\n- '.join(disconnected_nodes))
E           Failed: The following nodes disconnected from master at any point:
E           - worker_21
E           - worker_13
E           - worker_4
E           - worker_6
E           - worker_18
E           - worker_20
E           - worker_11
E           - worker_2
E           - worker_14
E           - worker_3
E           - worker_9
E           - worker_17
E           - worker_22
E           - worker_7
E           - worker_5
E           - worker_19
E           - worker_12
E           - worker_25
E           - worker_8
E           - worker_24
E           - worker_23
E           - worker_10
E           - worker_16
E           - worker_15
E           - worker_1

test_cluster_logs/test_cluster_connection/test_cluster_connection.py:47: Failed

Steps to reproduce

It can be reproduced by running the Workload benchmark test pipeline and after running manual reliability tests using wazuh-qa tests in 4.6.0.
The pipeline should be run with the same parameters as https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/274/

Report is attached here.

The text was updated successfully, but these errors were encountered:

pro-akim · 2023-08-03T10:00:43Z

Update

It seems related to wazuh/wazuh#17928

Selutario · 2023-12-13T09:59:09Z

All the reliability tests:

test_cluster_connection
test_cluster_error_logs
test_check_logs_order_workers
test_check_logs_order_master
test_cluster_sync
test_cluster_task_order

are not currently prepared to analyze logs where one of the managers has been restarted. They should be adapted to work correctly when this happens, since it is a usual scenario in the workload benchmarks if the API performance is being tested.

For example, the test_check_logs_order_workers checks whether the logs of every cluster task (Agent-info sync, Integrity check, etc.) appear in the expected order. However, tasks are reinitialized when the connection is broken because of a cluster restart, so the test should also reset the expected order when it finds a connection broken log.

Selutario · 2024-04-23T10:05:02Z

It was reported here that some tests did not run in the last time build. We should fix it as part of this issue if the problem keeps happening:

Fix cluster reliability tests #5281

nico-stefani · 2024-05-24T22:48:39Z

Update

I was reading the tests code and giving the first steps to the resolution.

nico-stefani · 2024-05-27T22:20:47Z

Update

I was debugging the test test_check_logs_order_workers and I got working the case for a particular logs sample.

2023/08/14 19:42:19 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Permission to synchronize granted.
2023/08/14 19:42:19 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Starting.
2023/08/14 19:42:19 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Obtained 29 chunks of data in 0.235s.
2023/08/14 19:42:19 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Sending chunks.
2023/08/14 19:42:20 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] The master closed the connection
2023/08/14 19:43:50 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Permission to synchronize granted.
2023/08/14 19:43:50 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Starting.
2023/08/14 19:43:50 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Obtained 29 chunks of data in 0.232s.
2023/08/14 19:43:52 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Sending chunks.
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] Command received: 'b'dapi''
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Received request: b'master*124633 {"f": {"__callable__": {"__name__": "restart", "__module__": "wazuh.manager", "__qualname__": "restart", "__type__": "function"}}, "f_kwargs": {}, "request_type": "distributed_master", "wait_for_complete": false, "from_cluster": true, "is_async": false, "local_client_arg": null, "basic_services": ["wazuh-modulesd", "wazuh-analysisd", "wazuh-execd", "wazuh-db", "wazuh-remoted"], "rbac_permissions": {"agent:create": {"*:*:*": "allow"}, "group:create": {"*:*:*": "allow"}, "agent:read": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:delete": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:modify_group": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:reconnect": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:restart": {"agent:id:*": "allow", "agent:group:*": "allow"}, "agent:upgrade": {"agent:id:*": "allow", "agent:group:*": "allow"}, "group:read": {"group:id:*": "allow"}, "group:delete": {"group:id:*": "allow"}, "group:update_config": {"group:id:*": "allow"}, "group:modify_assignments": {"group:id:*": "allow"}, "active-response:command": {"agent:id:*": "allow"}, "security:create": {"*:*:*": "allow"}, "security:create_user": {"*:*:*": "allow"}, "security:read_config": {"*:*:*": "allow"}, "security:update_config": {"*:*:*": "allow"}, "security:revoke": {"*:*:*": "allow"}, "security:edit_run_as": {"*:*:*": "allow"}, "security:read": {"role:id:*": "allow", "policy:id:*": "allow", "user:id:*": "allow", "rule:id:*": "allow"}, "security:update": {"role:id:*": "allow", "policy:id:*": "allow", "user:id:*": "allow", "rule:id:*": "allow"}, "security:delete": {"role:id:*": "allow", "policy:id:*": "allow", "user:id:*": "allow", "rule:id:*": "allow"}, "cluster:status": {"*:*:*": "allow"}, "manager:read": {"*:*:*": "allow"}, "manager:read_api_config": {"*:*:*": "allow"}, "manager:update_config": {"*:*:*": "allow"}, "manager:restart": {"*:*:*": "allow"}, "cluster:read_api_config": {"node:id:*": "allow"}, "cluster:read": {"node:id:*": "allow"}, "cluster:restart": {"node:id:*": "allow"}, "cluster:update_config": {"node:id:*": "allow"}, "ciscat:read": {"agent:id:*": "allow"}, "decoders:read": {"decoder:file:*": "allow"}, "decoders:delete": {"decoder:file:*": "allow"}, "decoders:update": {"*:*:*": "allow"}, "lists:read": {"list:file:*": "allow"}, "lists:delete": {"list:file:*": "allow"}, "lists:update": {"*:*:*": "allow"}, "rootcheck:clear": {"agent:id:*": "allow"}, "rootcheck:read": {"agent:id:*": "allow"}, "rootcheck:run": {"agent:id:*": "allow"}, "rules:read": {"rule:file:*": "allow"}, "rules:delete": {"rule:file:*": "allow"}, "rules:update": {"*:*:*": "allow"}, "mitre:read": {"*:*:*": "allow"}, "sca:read": {"agent:id:*": "allow"}, "syscheck:clear": {"agent:id:*": "allow"}, "syscheck:read": {"agent:id:*": "allow"}, "syscheck:run": {"agent:id:*": "allow"}, "syscollector:read": {"agent:id:*": "allow"}, "logtest:run": {"*:*:*": "allow"}, "task:status": {"*:*:*": "allow"}, "vulnerability:read": {"agent:id:*": "allow"}, "vulnerability:run": {"*:*:*": "allow"}, "rbac_mode": "white"}, "current_user": "", "broadcasting": true, "nodes": ["master", "CLUSTER-Workload_benchmarks_metrics_B283_manager_15", "CLUSTER-Workload_benchmarks_metrics_B283_manager_1", "CLUSTER-Workload_benchmarks_metrics_B283_manager_10", "CLUSTER-Workload_benchmarks_metrics_B283_manager_25", "CLUSTER-Workload_benchmarks_metrics_B283_manager_13", "CLUSTER-Workload_benchmarks_metrics_B283_manager_2", "CLUSTER-Workload_benchmarks_metrics_B283_manager_19", "CLUSTER-Workload_benchmarks_metrics_B283_manager_11", "CLUSTER-Workload_benchmarks_metrics_B283_manager_6", "CLUSTER-Workload_benchmarks_metrics_B283_manager_20", "CLUSTER-Workload_benchmarks_metrics_B283_manager_8", "CLUSTER-Workload_benchmarks_metrics_B283_manager_4", "CLUSTER-Workload_benchmarks_metrics_B283_manager_7", "CLUSTER-Workload_benchmarks_metrics_B283_manager_9", "CLUSTER-Workload_benchmarks_metrics_B283_manager_22", "CLUSTER-Workload_benchmarks_metrics_B283_manager_12", "CLUSTER-Workload_benchmarks_metrics_B283_manager_16", "CLUSTER-Workload_benchmarks_metrics_B283_manager_17", "CLUSTER-Workload_benchmarks_metrics_B283_manager_14", "CLUSTER-Workload_benchmarks_metrics_B283_manager_24", "CLUSTER-Workload_benchmarks_metrics_B283_manager_23", "CLUSTER-Workload_benchmarks_metrics_B283_manager_5", "CLUSTER-Workload_benchmarks_metrics_B283_manager_18", "CLUSTER-Workload_benchmarks_metrics_B283_manager_21", "CLUSTER-Workload_benchmarks_metrics_B283_manager_3"], "api_timeout": 10}'
2023/08/14 19:43:54 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Receiving request: restart from master (124633)
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Receiving parameters {}
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Starting to execute request locally
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Finished executing request locally
2023/08/14 19:43:54 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [D API] Time calculating request result: 0.024s
2023/08/14 19:43:55 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] Command received: 'b'syn_m_a_e''
2023/08/14 19:43:55 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Agent-info sync] Finished in 5.318s. Updated 29 chunks.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Permission to synchronize granted.
2023/08/14 19:43:58 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Starting.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Compressing 'files_metadata.json' of 34 files.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Sending zip file to master.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Zip file sent to master.
2023/08/14 19:43:58 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] Command received: 'b'syn_m_c_ok''
2023/08/14 19:43:58 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Integrity check] Finished in 0.051s. Sync not required.
2023/08/14 19:43:59 INFO: [Worker CLUSTER-Workload_benchmarks_metrics_B283_manager_23] [Main] The master closed the connection
2023/08/14 19:43:59 INFO: [Cluster] [Main] SIGNAL [(15)-(SIGTERM)] received. Exit...
2023/08/14 19:44:18 DEBUG: [Cluster] [Main] Removing '/var/ossec/queue/cluster/'.
2023/08/14 19:44:18 DEBUG: [Cluster] [Main] Removed '/var/ossec/queue/cluster/'.
2023/08/14 19:44:19 INFO: [Local Server] [Main] Serving on /var/ossec/queue/cluster/c-internal.sock
2023/08/14 19:44:19 DEBUG: [Local Server] [Keep alive] Calculating.
2023/08/14 19:44:19 DEBUG: [Local Server] [Keep alive] Calculated.

(wqa) ➜  wazuh-qa git:(fix/4385-unstable-realiability-tests) ✗ pytest -s --artifacts_path=/home/nstefani/Downloads/283/ tests/reliability/test_cluster/test_cluster_logs/test_cluster_worker_logs_order/test_cluster_worker_logs_order.py
================================================================================================ test session starts ================================================================================================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-0.13.1
rootdir: /home/nstefani/git/wazuh-qa/tests, configfile: pytest.ini
plugins: html-3.1.1, testinfra-5.0.0, metadata-2.0.4
collected 1 item

tests/reliability/test_cluster/test_cluster_logs/test_cluster_worker_logs_order/test_cluster_worker_logs_order.py .

================================================================================================= 1 passed in 0.05s =================================================================================================

The next step is to generalize this solution for more cases.

nico-stefani · 2024-05-29T21:18:51Z

Update

I've been working on the fix of test_cluster_connection.

Still, I need to investigate the behavior of test_check_logs_order_workers with the changes introduced in wazuh/wazuh#19888.

nico-stefani · 2024-05-31T19:08:39Z

Update

I've been working on the requested changes.

pro-akim closed this as completed Aug 2, 2023

pro-akim transferred this issue from wazuh/wazuh Aug 2, 2023

pro-akim reopened this Aug 2, 2023

pro-akim mentioned this issue Aug 2, 2023

Unstable Reliability test results #4386

Closed

pro-akim added type/bug level/task Task issue labels Aug 2, 2023

fdalmaup mentioned this issue Sep 8, 2023

Release 4.6.0 - Alpha 1 - Workload benchmarks metrics wazuh/wazuh#18874

Closed

3 tasks

damarisg mentioned this issue Oct 11, 2023

Release 4.6.0 - RC 1 wazuh/wazuh#19471

Closed

This was referenced Dec 12, 2023

Release 4.8.0 - Alpha 1 - Workload benchmarks metrics wazuh/wazuh#20721

Closed

Improve test_cluster_error_logs test #4767

Closed

damarisg added the qa_known Issues that are already known by the QA team label Dec 13, 2023

EduLeon12 mentioned this issue Jan 11, 2024

Release 4.8.0 - Alpha 2 - Workload benchmarks metrics wazuh/wazuh#21407

Closed

1 task

fdalmaup mentioned this issue Feb 22, 2024

Release 4.7.3 - Release Candidate 1 - Workload benchmarks metrics wazuh/wazuh#22023

Closed

2 tasks

GGP1 mentioned this issue Feb 26, 2024

Release 4.8.0 - Beta 2 - Workload benchmarks metrics wazuh/wazuh#22126

Closed

2 tasks

This was referenced Mar 13, 2024

Release 4.8.0 - Beta 4 wazuh/wazuh#22457

Closed

Release 4.8.0 - Beta 4 - Workload benchmarks metrics wazuh/wazuh#22498

Closed

fdalmaup mentioned this issue Apr 8, 2024

Release 4.8.0 - Beta 5 - Workload benchmarks metrics wazuh/wazuh#22833

Closed

2 tasks

GGP1 mentioned this issue Apr 22, 2024

Release 4.8.0 - Beta 6 - Workload benchmarks metrics wazuh/wazuh#23065

Closed

2 tasks

juliamagan mentioned this issue Apr 24, 2024

Fix cluster reliability tests #5281

Closed

nico-stefani mentioned this issue May 6, 2024

Release 4.8.0 - Release Candidate 1 - Workload benchmarks metrics wazuh/wazuh#23268

Closed

2 tasks

GGP1 mentioned this issue May 15, 2024

Release 4.8.0 - Release Candidate 2 - Workload benchmarks metrics wazuh/wazuh#23440

Closed

1 task

javiersanchz mentioned this issue May 23, 2024

Release 4.7.5 - Release Candidate 1 - Workload benchmarks metrics wazuh/wazuh#23598

Closed

1 task

nico-stefani self-assigned this May 24, 2024

fdalmaup mentioned this issue May 29, 2024

Release 4.8.0 - Release Candidate 3 - Workload benchmarks metrics wazuh/wazuh#23712

Closed

2 tasks

nico-stefani mentioned this issue May 30, 2024

Fix unstable reliability tests #5448

Closed

nico-stefani linked a pull request May 30, 2024 that will close this issue

Fix unstable reliability tests #5448

Closed

RamosFe mentioned this issue Jun 10, 2024

Release 4.8.0 - Release Candidate 4 - Workload benchmarks metrics wazuh/wazuh#23992

Closed

3 tasks

nico-stefani mentioned this issue Jun 10, 2024

Fix unstable cluster reliability tests #5477

Merged

nico-stefani linked a pull request Jun 10, 2024 that will close this issue

Fix unstable cluster reliability tests #5477

Merged

Selutario closed this as completed Jun 10, 2024

javiersanchz mentioned this issue Jul 8, 2024

Release 4.8.1 - Release Candidate 2 - Workload benchmarks metrics wazuh/wazuh#24481

Closed

3 tasks

GGP1 mentioned this issue Aug 5, 2024

Release 4.9.0 - Beta 1 - Workload benchmarks metrics wazuh/wazuh#25053

Closed

2 tasks

RamosFe mentioned this issue Aug 14, 2024

Release 4.9.0 - Beta 2 - Workload benchmarks metrics wazuh/wazuh#25175

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable connection between master and workers according cluster.log in Reliability test #4385

Unstable connection between master and workers according cluster.log in Reliability test #4385

pro-akim commented Aug 2, 2023 •

edited

Loading

pro-akim commented Aug 3, 2023 •

edited

Loading

Selutario commented Dec 13, 2023 •

edited

Loading

Selutario commented Apr 23, 2024

nico-stefani commented May 24, 2024

nico-stefani commented May 27, 2024

nico-stefani commented May 29, 2024 •

edited

Loading

nico-stefani commented May 31, 2024

Unstable connection between master and workers according cluster.log in Reliability test #4385

Unstable connection between master and workers according cluster.log in Reliability test #4385

Comments

pro-akim commented Aug 2, 2023 • edited Loading

Description

Steps to reproduce

pro-akim commented Aug 3, 2023 • edited Loading

Update

Selutario commented Dec 13, 2023 • edited Loading

Selutario commented Apr 23, 2024

nico-stefani commented May 24, 2024

Update

nico-stefani commented May 27, 2024

Update

nico-stefani commented May 29, 2024 • edited Loading

Update

nico-stefani commented May 31, 2024

Update

pro-akim commented Aug 2, 2023 •

edited

Loading

pro-akim commented Aug 3, 2023 •

edited

Loading

Selutario commented Dec 13, 2023 •

edited

Loading

nico-stefani commented May 29, 2024 •

edited

Loading