Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Z9332][202012]: "FLEX_COUNTER_TABLE" is missing from config_db.json after fast-reboot #9597

Closed
chitra-raghavan opened this issue Dec 20, 2021 · 5 comments

Comments

@chitra-raghavan
Copy link
Contributor

Description

"FLEX_COUNTER_TABLE" is missing from config_db.json after fast-reboot.
The Configs are added back to config_db.json after config save , Once the system settles down after few minutes ,

Steps to reproduce the issue:

  1. Fast-reboot the device
  2. Initially for few minutes , FLEX COUNTERS will not be present in the device.
  3. check for FLEX_COUNTER_TABLE in conifg_db.json
  4. Wait for Few minutes for FLEX_COUNTERS to get re-initiated.
  5. config save -y for the FLEX_COUNTER configs to get updated.
  6. Check for the FLEX_COUNTER_TABLE in config_db.json

Describe the results you received:

When device comes up after fastboot FLEX_COUNTER_TABLE configs are re-initiated in the device only after few minutes
FLEX_COUNTER_TABLE is missing in config_db.json and re-added only upon config save -y ( Need to wait for few min for FLEX counters to get started)

Until the flex counters get initiated , PFC,PFCWD doesnt work for initial few minutes.

Logs

root@sonic:/home/admin# fast-reboot
..
[    9.228928] rc.local[564]: + gunzip -d -c /var/log/fsck.log.gz
[    9.310088] rc.local[565]: + logger -t FSCK
[    9.365120] rc.local[454]: + rm -f /var/log/fsck.log.gz
[    9.435012] rc.local[454]: + exit 0

Debian GNU/Linux 10 sonic ttyS0

sonic login: admin
Password:

Login incorrect
sonic login: admin
Password:
Last login: Mon Dec 20 12:48:27 UTC 2021 on ttyS0
Linux sonic 4.19.0-12-2-amd64 #1 SMP Debian 4.19.152-1 (2020-10-18) x86_64
You are on
  ____   ___  _   _ _  ____
 / ___| / _ \| \ | (_)/ ___|
 \___ \| | | |  \| | | |
  ___) | |_| | |\  | | |___
 |____/ \___/|_| \_|_|\____|

-- Software for Open Networking in the Cloud --

Unauthorized access and/or use are prohibited.
All access and/or use are subject to monitoring.

Help:      http://azure.github.io/SONiC/
Wiki:      https://microsoft.sharepoint.com/teams/WAG/AzureNetworking/Wiki/SONiC.aspx
On-Call:   https://portal.microsofticm.com/imp/v3/oncall/current?serviceId=10045&teamIds=26162
Dashboard: https://aka.ms/sonic-dri
Contact:   [email protected]

sudo -admin@sonic:~$ sudo -i
root@sonic:~#
root@sonic:~# show reboot-cause
User issued 'fast-reboot' command [User: admin, Time: Mon 20 Dec 2021 12:42:55 PM UTC]
root@sonic:~#
root@sonic:~#
root@sonic:~# grep -i flex /etc/sonic/config_db.json
root@sonic:~# 
root@sonic:~# uptime
 12:49:10 up 5 min,  1 user,  load average: 0.90, 0.76, 0.38
root@sonic:~#
root@sonic:~#
root@sonic:~# config save -y
Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json
root@sonic:~# !g
grep -i flex /etc/sonic/config_db.json
    "FLEX_COUNTER_TABLE": {
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
            "FLEX_COUNTER_STATUS": "enable"
root@sonic:~#  

Output of show version:

root@sonic:/etc/sonic# show ver

SONiC Software Version: SONiC.20201231.44
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: 63ef7f53bc
Build date: Wed Nov 24 02:53:38 UTC 2021
Built by: cloudtest@876d3befc000000

Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-O32
ASIC: broadcom
ASIC Count: 1
Serial Number: TH04CN21CET009BR0023
Uptime: 13:09:13 up 5 min,  1 user,  load average: 1.28, 0.90, 0.43

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-syncd-brcm          20201231.44         9f91a09d0e9c        672MB
docker-syncd-brcm          latest              9f91a09d0e9c        672MB
docker-teamd               20201231.44         48053e1d92e8        390MB
docker-teamd               latest              48053e1d92e8        390MB
docker-router-advertiser   20201231.44         b6f6ccc4ba7f        380MB
docker-router-advertiser   latest              b6f6ccc4ba7f        380MB
docker-platform-monitor    20201231.44         76c085c0a328        561MB
docker-platform-monitor    latest              76c085c0a328        561MB
docker-lldp                20201231.44         a4a5ad2cde5e        420MB
docker-lldp                latest              a4a5ad2cde5e        420MB
docker-snmp                20201231.44         2439767a3279        422MB
docker-snmp                latest              2439767a3279        422MB
docker-dhcp-relay          20201231.44         aee1d0cc3dbb        393MB
docker-dhcp-relay          latest              aee1d0cc3dbb        393MB
docker-database            20201231.44         da24c051253b        379MB
docker-database            latest              da24c051253b        379MB
docker-orchagent           20201231.44         820bdc795273        408MB
docker-orchagent           latest              820bdc795273        408MB
docker-sonic-telemetry     20201231.44         178c953e9f05        469MB
docker-sonic-telemetry     latest              178c953e9f05        469MB
docker-mux                 20201231.44         bcd196bd7a3c        432MB
docker-mux                 latest              bcd196bd7a3c        432MB
docker-fpm-frr             20201231.44         0e266910f4b7        408MB
docker-fpm-frr             latest              0e266910f4b7        408MB
docker-sonic-restapi       20201231.44         6b791c7e73e3        345MB
docker-sonic-restapi       latest              6b791c7e73e3        345MB
docker-acms                20201231.44         d0848884cc78        181MB
docker-acms                latest              d0848884cc78        181MB
k8s.gcr.io/pause           3.4.1               0f8457a4c2ec        683kB

root@sonic:/etc/sonic#

Output of show techsupport:

sonic_dump_sonic_20211220_124940.tar.gz

config_db files

config_db-Before_FastReboot.txt
config_db-AfterFastReboot.txt
config_db-AfterConfigSave.txt

Additional information you deem important (e.g. issue happens only occasionally):

@gechiang
Copy link
Collaborator

gechiang commented Jan 4, 2022

@vaibhavhd Can you please help take a look at this issue raised by DELL?
It appears to be related to the fix you made in attempt to reduce the fast-reboot downtime...
(sonic-net/sonic-utilities#1774)
The question is once the FLEX_COUNTER_TABLE is deleted from the CONFIG DB during fast-reboot, There seems to be missing the restore action after fast-reboot completes wher the FLEX_COUNTER_TABLE is only in running config and not restored back in the config_dg.json file itself...

@gechiang
Copy link
Collaborator

@chitra-raghavan We have a question in regard to the PFC, PFCWD issue that you observed.

  • Is the PFC, PFCWD not functioning after fast-reboot a permanent issue after fast-reboot if no one performs "config save -y"?

@chitra-raghavan
Copy link
Contributor Author

@gechiang , PFCWD functions after few minutes when flex counters get initiated.
only intermediate time between Ports coming up and flex counters getting initiated , PFCWD doesnt work.

The configs are written back to config_db.json in /etc/sonic/ only after config save -y.

@gechiang
Copy link
Collaborator

@chitra-raghavan I have discussed with @vaibhavhd about this and the changes that he made was needed to reduce the fast-reboot down time. The problem you observed is in-line with what was expected. For post 202012 and master image a more "complete" solution is in place, so we do not expect to see what you observed here in 202012.
We understand this has an impact to PFCWD during this window and we believe since this is triggered via fast-reboot scenario, it is expected that during this brief moment of time PFCWD will not be functioning. Given we need to have a tradeoff between reducing fast-reboot down time vs. PFCWD small window not available we are gearing towards not addressing this issue.

As for the config-db.json is not matching that of the running config it is also expected. If user performed additional con figuration at that time it will perform config save which will take care of ensuring that the FLEX COUNTER config will be applied into config-db.json file. Even if let's say somehow the switch rebooted explicitly/implicitly, upon reboot completes, the FLEX COUNTER config will be in the running config and will not be lost.
Let me know if you feel there are additional things you are not comfortable with.
If not, perhaps we can move this issue to close state as known expected behavior and will not address it.
Thanks!

@chitra-raghavan
Copy link
Contributor Author

@gechiang , if this expected issue , we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants