Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_reload_config.py test failure due to interface-config.service dependency #11654

Open
anamehra opened this issue Feb 6, 2024 · 11 comments
Open

Comments

@anamehra
Copy link
Contributor

anamehra commented Feb 6, 2024

Description
test_reload_config test case has a dependency on interfaces-config.service. In a test to validate config reload failure when swss is not up, the test checks for interfaces-config.service state (https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/test_reload_config.py#L129) to be up to test config reload, expecting swss is not up at that time. This change was introduced by #7289
This code assumes that the swss service has a dependency on interface-config.service and starts after the interface-config, which was the case earlier. This dependency was removed a year ago via sonic-net/sonic-buildimage#13084

There is no direct dependency between interfaces-config and swss, so systemd won't guarantee swss service starts only after interfaces-config.

Steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important:

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@anamehra
Copy link
Contributor Author

anamehra commented Feb 6, 2024

Hi @abdosi , @JibinBao , @stepanblyschak , for your viz. Thanks

@stepanblyschak
Copy link
Contributor

Hi @anamehra, I am looking at the test code, not sure I understand why test expects swss.service to depend on interfaces-config.service, also, why "config reload" should throw an error?

@anamehra
Copy link
Contributor Author

anamehra commented Feb 6, 2024

Hi @stepanblyschak , the test code validates a negative scenario where config reload should fail when swss is not up yet. It relies on the timing to run the command right at the moment when redis is accessible but swss has not started yet and I think interfaces-config check was introduced just to achieve ive that. Given that swss has no dependency on interfaces-config now, the test case needs to change. I think instead of validating this negative scenario during boot, it should just explicitly stop swss and validate that config reload fails.

@stepanblyschak
Copy link
Contributor

@anamehra Thanks, so as I understand "config reload" should fail when swss is down, any reason why? "config reload" cold restarts all services, so not sure such check in "config reload" is needed or are there any issues restarting services when swss is down?

@JibinBao
Copy link
Contributor

JibinBao commented Feb 7, 2024

@dgsudharsan, Can you help check the above issue? If there is no direct dependency between interfaces-config and swss, shoud we revert the fix #7289? And then how to stabilize the test_reload_configuration_checks?

@dgsudharsan
Copy link
Contributor

@anamehra This check was introduced to overcome a race condition where config reload gets executed while interface-services.config is running and the loopback ip is not accessible giving traceback for the command the resulting in the test failure. Below is example

Jan  4 06:29:19.434440 r-bulldog-02 INFO systemd[1]: Starting Update interfaces configuration...
Jan  4 06:29:21.659292 r-bulldog-02 INFO python[11575]: ansible-command Invoked with executable=/bin/bash _uses_shell=True _raw_params=sudo config reload -y warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None creates=None removes=None stdin=None
Jan  4 06:29:25.786037 r-bulldog-02 INFO systemd[1]: Finished Update interfaces configuration.

You will get

File "/usr/local/...: Unable to connect to redis: Cannot assign requested address

However this issue was fixed by @saiarcot895 sonic-net/sonic-buildimage#15080. Given we have this fix the change in sonic-mgmt can be reverted.

@JibinBao please revert the change to check interface-services.config.

JibinBao added a commit to JibinBao/sonic-mgmt that referenced this issue Feb 8, 2024
1. Revert PR: sonic-net#7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: sonic-net#7953 due to the above same reason

Change-Id: Ifde0a83020bb06fb191a646e5ca91738b864ad7f
JibinBao added a commit to JibinBao/sonic-mgmt that referenced this issue Feb 8, 2024
1. Revert PR: sonic-net#7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: sonic-net#7953 due to the above same reason

Change-Id: I022a948cb2e36ba7d5538258159ea047d13d80a7
@JibinBao
Copy link
Contributor

JibinBao commented Feb 8, 2024

Fixed:
#11674
#11675

wangxin pushed a commit that referenced this issue Mar 4, 2024
1. Revert PR: #7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: #7953 due to the above same reason
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this issue Mar 4, 2024
1. Revert PR: sonic-net#7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: sonic-net#7953 due to the above same reason
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this issue Mar 4, 2024
1. Revert PR: sonic-net#7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: sonic-net#7953 due to the above same reason
wangxin pushed a commit that referenced this issue Mar 4, 2024
Cherry-pick #11674 to 202205 branch
1. Revert PR: #7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: #7953 due to the above same reason
mssonicbld pushed a commit that referenced this issue Mar 4, 2024
1. Revert PR: #7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: #7953 due to the above same reason
mssonicbld pushed a commit that referenced this issue Mar 4, 2024
1. Revert PR: #7289, because the relevant issue has been fixed by PR:sonic-net/sonic-buildimage#15080
2. Remove  wait_until(60, 1, 0, check_interfaces_config_service_status, duthost) in PR: #7953 due to the above same reason
@abdosi
Copy link
Contributor

abdosi commented Mar 25, 2024

@anamehra : Please check if below PR fixes this issue.

Fixed:
#11674
#11675

@anamehra
Copy link
Contributor Author

anamehra commented May 11, 2024

The original issue is still not fixed. interfcae-config.service dependency is removed but now we see the original "redis connection failure" error if the config reload is done quickly after database check passes.

The config reload check after database up and before swss start is totally timing based here. If we want to validate config reload to fail for swss down, we can have a test case to stop swss and check the config reload.
We are relying on swss state, so the test case should be designed to make sure we have the setup in condition which we want to test.

@bmridul , @vperumal , for your viz

@bmridul
Copy link
Contributor

bmridul commented May 11, 2024

Log for the above..

10/05/2024 20:43:12 utilities.wait_until                     L0146 DEBUG  | check_database_status is True, exit early with True
10/05/2024 20:43:12 test_reload_config.test_reload_configura L0131 INFO   | Reload configuration check
10/05/2024 20:43:12 base._run                                L0067 DEBUG  | /data/tests/common/devices/multi_asic.py::_run_on_asics#128: [croc4-aaa14-dut] AnsibleModule::shell, args=["sudo config reload -y"], kwargs={"executable": "/bin/bash", "module_ignore_errors": true}
10/05/2024 20:43:14 base._run                                L0104 DEBUG  | /data/tests/common/devices/multi_asic.py::_run_on_asics#128: [croc4-aaa14-dut] AnsibleModule::shell Result => {"failed": true, "msg": "non-zero return code", "cmd": "sudo config reload -y", "stdout": "", "stderr": "Traceback (most recent call last):\n  File \"/usr/local/bin/config\", line 8, in <module>\n    sys.exit(config())\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 764, in __call__\n    return self.main(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 717, in main\n    rv = self.invoke(ctx)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 1134, in invoke\n    Command.invoke(self, ctx)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 956, in invoke\n    return ctx.invoke(self.callback, **ctx.params)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 555, in invoke\n    return callback(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/dist-packages/click/decorators.py\", line 17, in new_func\n    return f(get_current_context(), *args, **kwargs)\n  File \"/usr/local/lib/python3.9/dist-packages/config/main.py\", line 1186, in config\n    ctx.obj = Db()\n  File \"/usr/local/lib/python3.9/dist-packages/utilities_common/db.py\", line 12, in __init__\n    self.cfgdb.connect()\n  File \"/usr/lib/python3/dist-packages/swsscommon/swsscommon.py\", line 2301, in connect\n    return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)\nRuntimeError: Unable to connect to redis: Cannot assign requested address", "rc": 1, "start": "2024-05-10 20:46:50.911295", "end": "2024-05-10 20:46:52.270540", "delta": "0:00:01.359245", "changed": true, "invocation": {"module_args": {"executable": "/bin/bash", "_raw_params": "sudo config reload -y", "_uses_shell": true, "warn": true, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "creates": null, "removes": null, "stdin": null}}, "warnings": ["Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], "stdout_lines": [], "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/local/bin/config\", line 8, in <module>", "    sys.exit(config())", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 764, in __call__", "    return self.main(*args, **kwargs)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 717, in main", "    rv = self.invoke(ctx)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 1134, in invoke", "    Command.invoke(self, ctx)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 956, in invoke", "    return ctx.invoke(self.callback, **ctx.params)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 555, in invoke", "    return callback(*args, **kwargs)", "  File \"/usr/local/lib/python3.9/dist-packages/click/decorators.py\", line 17, in new_func", "    return f(get_current_context(), *args, **kwargs)", "  File \"/usr/local/lib/python3.9/dist-packages/config/main.py\", line 1186, in config", "    ctx.obj = Db()", "  File \"/usr/local/lib/python3.9/dist-packages/utilities_common/db.py\", line 12, in __init__", "    self.cfgdb.connect()", "  File \"/usr/lib/python3/dist-packages/swsscommon/swsscommon.py\", line 2301, in connect", "    return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)", "RuntimeError: Unable to connect to redis: Cannot assign requested address"], "_ansible_no_log": false}
10/05/2024 20:43:14 __init__.pytest_runtest_call             L0040 ERROR  | Traceback (most recent call last):

@anamehra
Copy link
Contributor Author

@anamehra This check was introduced to overcome a race condition where config reload gets executed while interface-services.config is running and the loopback ip is not accessible giving traceback for the command the resulting in the test failure. Below is example

Jan  4 06:29:19.434440 r-bulldog-02 INFO systemd[1]: Starting Update interfaces configuration...
Jan  4 06:29:21.659292 r-bulldog-02 INFO python[11575]: ansible-command Invoked with executable=/bin/bash _uses_shell=True _raw_params=sudo config reload -y warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None creates=None removes=None stdin=None
Jan  4 06:29:25.786037 r-bulldog-02 INFO systemd[1]: Finished Update interfaces configuration.

You will get

File "/usr/local/...: Unable to connect to redis: Cannot assign requested address

However this issue was fixed by @saiarcot895 sonic-net/sonic-buildimage#15080. Given we have this fix the change in sonic-mgmt can be reverted.

@JibinBao please revert the change to check interface-services.config.

HI @dgsudharsan , looks like the issue is not completely fixed. We still observe the error
if config reload is done with few secs of database service active. The test_reload_config.py is still failing intermittently due to this.
"RuntimeError: Sonic database config file doesn't exist at /var/run/redis/sonic-db/database_config.json"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants