You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Observe that after create_switch() SAI discovery process runs and takes (in this case 1.02 sec):
Feb 10 11:38:20.926013 r-panther-13 NOTICE syncd#SDK: :- discover: discover took 0.203495 sec
Feb 10 11:38:20.926309 r-panther-13 NOTICE syncd#SDK: :- discover: discovered objects count: 1386
Feb 10 11:38:20.926489 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_PORT: 33
Feb 10 11:38:20.926597 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_VIRTUAL_ROUTER: 1
Feb 10 11:38:20.926722 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_STP: 1
Feb 10 11:38:20.926823 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP: 1
Feb 10 11:38:20.926943 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_QUEUE: 512
Feb 10 11:38:20.927045 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_SCHEDULER_GROUP: 512
Feb 10 11:38:20.927165 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP: 256
Feb 10 11:38:20.927267 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_HASH: 2
Feb 10 11:38:20.927387 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_SWITCH: 1
Feb 10 11:38:20.927520 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_VLAN: 1
Feb 10 11:38:20.927711 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_VLAN_MEMBER: 32
Feb 10 11:38:20.927813 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_BRIDGE: 1
Feb 10 11:38:20.928017 r-panther-13 NOTICE syncd#SDK: :- discover: SAI_OBJECT_TYPE_BRIDGE_PORT: 33
Feb 10 11:38:20.928882 r-panther-13 NOTICE syncd#SDK: :- helperSaveDiscoveredObjectsToRedis: objects in ASIC state table present: 0
Feb 10 11:38:20.929008 r-panther-13 NOTICE syncd#SDK: :- helperSaveDiscoveredObjectsToRedis: putting ALL discovered objects to redis
Feb 10 11:38:21.601662 r-panther-13 NOTICE syncd#SDK: :- helperSaveDiscoveredObjectsToRedis: save discovered objects to redis took 0.673484 sec
Feb 10 11:38:21.602082 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x1 to Asic View and COLDVIDS
Feb 10 11:38:21.602592 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x100000026 to Asic View and COLDVIDS
Feb 10 11:38:21.603029 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x10 to Asic View and COLDVIDS
Feb 10 11:38:21.603480 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x3 to Asic View and COLDVIDS
Feb 10 11:38:21.603693 r-panther-13 WARNING syncd#SDK: [SAI_UTILS.WARNING] mlnx_sai_utils.c[1691]- check_attribs_metadata: Not implemented attribute SAI_SWITCH_ATTR_DEFAULT_OVERRIDE_VIRTUAL_ROUTER_ID (vendor data not found)
Feb 10 11:38:21.603769 r-panther-13 WARNING syncd#SDK: [SAI_UTILS.WARNING] mlnx_sai_utils.c[2060]- sai_get_attributes: Failed attribs check, key:Switch ID 1
Feb 10 11:38:21.603861 r-panther-13 WARNING syncd#SDK: :- helperGetSwitchAttrOid: failed to get SAI_SWITCH_ATTR_DEFAULT_OVERRIDE_VIRTUAL_ROUTER_ID: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb 10 11:38:21.604488 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x10010039 to Asic View and COLDVIDS
Feb 10 11:38:21.605191 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x11 to Asic View and COLDVIDS
Feb 10 11:38:21.608251 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x1c to Asic View and COLDVIDS
Feb 10 11:38:21.608999 r-panther-13 NOTICE syncd#SDK: :- redisSaveInternalOids: put switch internal discovered rid oid:0x10000001c to Asic View and COLDVIDS
Feb 10 11:38:21.738942 r-panther-13 NOTICE syncd#SDK: :- helperLoadColdVids: read 1386 COLD VIDS
Feb 10 11:38:21.739078 r-panther-13 NOTICE syncd#SDK: :- SaiSwitch: constructor took 1.018046 sec
Describe the results you received:
SAI discover process took 1.02 sec, but we have seen different results for different platforms/configurations (up to 4 sec).
Describe the results you expected:
From fast/warm reboot design standpoint performing a lot of GET operations in the middle of switch booting delays the replay of configuration. Syncd could blindly replay the configuration as fast as possible and then discover default objects afterwards.
The text was updated successfully, but these errors were encountered:
stepanblyschak
changed the title
[boot performance] SAI discovery process running after switch creation causes delays
[boot performance] SAI discovery process running after switch creation in fast/warm boot causes delays
Feb 10, 2023
We are working now on optimizations for fast-reboot flow for switches with high number of ports. We saw that for 256 ports SAI discover for each port consumes more than 8 seconds where in this time orchagent is idle and waiting syncd to finish creating ports.
Is SAI discover on post port creation required in fast-reboot init flow?
In fast-reboot flow there is no comparison logic since current view is empty. (https://github.com/sonic-net/SONiC/blob/4ab89a9fdba3ced17f4e4d7f97892f93045905d1/doc/fast-reboot/Fast-reboot_Flow_Improvements_HLD.md#42-syncd-point-of-view---initapply-view-framework)
We tried skipping SAI discover that follows ports creation, in fast-reboot flow (run the community fast-reboot test multiple times) on Nvidia platforms and at least in that case we saw that this saved 6.5~ seconds of dataplane down time which is more than 20% of the allowed disruption length. As well system was stable and no issues observed.
Description
Steps to reproduce the issue:
Observe that after
create_switch()
SAI discovery process runs and takes (in this case 1.02 sec):Describe the results you received:
SAI discover process took 1.02 sec, but we have seen different results for different platforms/configurations (up to 4 sec).
Describe the results you expected:
From fast/warm reboot design standpoint performing a lot of GET operations in the middle of switch booting delays the replay of configuration. Syncd could blindly replay the configuration as fast as possible and then discover default objects afterwards.
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
sonic_dump_r-panther-13_20230210_114940.tar.gz
The text was updated successfully, but these errors were encountered: