Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[201911] Warmboot fails on Arista 7050 #5255

Closed
abdosi opened this issue Aug 26, 2020 · 1 comment · Fixed by #5315
Closed

[201911] Warmboot fails on Arista 7050 #5255

abdosi opened this issue Aug 26, 2020 · 1 comment · Fixed by #5315
Assignees

Comments

@abdosi
Copy link
Contributor

abdosi commented Aug 26, 2020

Issue:
Warmboot from 201911 image to same image fails on Arista 7050 with below logs.

Root Cause

Isssue is generic and can happen on any platform. It is more of timing issue. With the new supervisor way of starting process
swssconfig.sh starts after orchagent goes in running state

[program:swssconfig]
command=/usr/bin/swssconfig.sh
priority=6
autostart=false
autorestart=unexpected
startretries=0
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=orchagent:running

Meanwhile it's possible because of delay(timing ussue) swssconfig.sh check that Warm restart is enable might be ignored
and it is possible we will again load APP_DP with files (00-copp.config.json ipinip.json ports.json switch.json )
which can cause below issue (as seen in logs).
Possible fix can be add some delay so that orchagent can do initial processing
and update the State DB

if [[ "$SYSTEM_WARM_START" == "true" ]] || [[ "$SWSS_WARM_START" == "true" ]]; then
RESTORE_COUNT=sonic-db-cli STATE_DB hget "WARM_RESTART_TABLE|orchagent" restore_count
if [[ -n "$RESTORE_COUNT" ]] && [[ "$RESTORE_COUNT" != "0" ]]; then
exit 0
fi
fi

Logs

Aug 26 22:06:23.358466 str-a7050-acs-1 ERR swss#orchagent: :- processCoppRule: Failed to apply attribute[2].id=0 to policer for trap group:default, error:-5
Aug 26 22:06:23.358466 str-a7050-acs-1 ERR swss#orchagent: :- doTask: Processing copp task item failed, exiting.
Aug 26 22:06:23.358642 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_POLICER_ATTR_METER_TYPE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.358693 str-a7050-acs-1 ERR swss#orchagent: :- processCoppRule: Failed to apply attribute[2].id=0 to policer for trap group:trap.group.arp, error:-5
Aug 26 22:06:23.358693 str-a7050-acs-1 ERR swss#orchagent: :- doTask: Processing copp task item failed, exiting.
Aug 26 22:06:23.359964 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_TUNNEL_ATTR_DECAP_DSCP_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.359964 str-a7050-acs-1 ERR swss#orchagent: :- setTunnelAttribute: Failed to set attribute dscp_mode with value pipe
Aug 26 22:06:23.359964 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: 192.168.0.1 already exists. Did not create entry.
Aug 26 22:06:23.359964 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: 10.1.0.32 already exists. Did not create entry.
Aug 26 22:06:23.359964 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: 10.0.0.56 already exists. Did not create entry.
Aug 26 22:06:23.359964 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: 10.0.0.58 already exists. Did not create entry.
Aug 26 22:06:23.360052 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: 10.0.0.60 already exists. Did not create entry.
Aug 26 22:06:23.360052 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: 10.0.0.62 already exists. Did not create entry.
Aug 26 22:06:23.360066 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_TUNNEL_ATTR_DECAP_ECN_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.360093 str-a7050-acs-1 ERR swss#orchagent: :- setTunnelAttribute: Failed to set attribute ecn_mode with value copy_from_outer
Aug 26 22:06:23.360093 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_TUNNEL_ATTR_DECAP_TTL_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.360940 str-a7050-acs-1 ERR swss#orchagent: :- setTunnelAttribute: Failed to set attribute ttl_mode with value pipe
Aug 26 22:06:23.360940 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_TUNNEL_ATTR_DECAP_DSCP_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.360940 str-a7050-acs-1 ERR swss#orchagent: :- setTunnelAttribute: Failed to set attribute dscp_mode with value pipe
Aug 26 22:06:23.361022 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: fc00::71 already exists. Did not create entry.
Aug 26 22:06:23.361022 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: fc00::75 already exists. Did not create entry.
Aug 26 22:06:23.361039 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: fc00::79 already exists. Did not create entry.
Aug 26 22:06:23.361062 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: fc00::7d already exists. Did not create entry.
Aug 26 22:06:23.361062 str-a7050-acs-1 ERR swss#orchagent: :- addDecapTunnelTermEntries: fc00:1::32 already exists. Did not create entry.
Aug 26 22:06:23.361080 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_TUNNEL_ATTR_DECAP_ECN_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.361092 str-a7050-acs-1 ERR swss#orchagent: :- setTunnelAttribute: Failed to set attribute ecn_mode with value copy_from_outer
Aug 26 22:06:23.361121 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_TUNNEL_ATTR_DECAP_TTL_MODE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.361252 str-a7050-acs-1 ERR swss#orchagent: :- setTunnelAttribute: Failed to set attribute ttl_mode with value pipe
Aug 26 22:06:23.361359 str-a7050-acs-1 NOTICE swss#orchagent: :- processCoppRule: Set trap group trap.group.bgp.lacp to host interface
Aug 26 22:06:23.361406 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_create: attribute key SAI_HOSTIF_TRAP_ATTR_TRAP_TYPE:16387; already exists, can't create
Aug 26 22:06:23.361406 str-a7050-acs-1 ERR swss#orchagent: :- applyAttributesToTrapIds: Failed to create trap 16387, rv:-5
Aug 26 22:06:23.361423 str-a7050-acs-1 ERR swss#orchagent: :- doTask: Processing copp task item failed, exiting.
Aug 26 22:06:23.361880 str-a7050-acs-1 ERR swss#orchagent: :- meta_generic_validation_set: SAI_POLICER_ATTR_METER_TYPE:SAI_ATTR_VALUE_TYPE_INT32 attr is create only and cannot be modified
Aug 26 22:06:23.361880 str-a7050-acs-1 ERR syncd#syncd: [none] brcm_sai_set_policer_attribute:470 policer create failed with error Operation still running (0xfffffff6).
Aug 26 22:06:23.361880 str-a7050-acs-1 ERR syncd#syncd: :- processEvent: VID: oid:0x120000000007f8 RID: oid:0x41200000002
Aug 26 22:06:23.361880 str-a7050-acs-1 ERR syncd#syncd: :- processEvent: attr: SAI_POLICER_ATTR_CBS: 600
Aug 26 22:06:23.361880 str-a7050-acs-1 ERR syncd#syncd: :- processEvent: failed to execute api: set, key: SAI_OBJECT_TYPE_POLICER:oid:0x120000000007f8, status: SAI_STATUS_OBJECT_IN_USE
Aug 26 22:06:23.361931 str-a7050-acs-1 ERR syncd#syncd: :- syncd_main: Runtime error: :- processEvent: failed to execute api: set, key: SAI_OBJECT_TYPE_POLICER:oid:0x120000000007f8, status: SAI_STATUS_OBJECT_IN_USE
Aug 26 22:06:23.361931 str-a7050-acs-1 NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA

@abdosi abdosi self-assigned this Aug 26, 2020
@abdosi abdosi changed the title Warmboot fails with 201911 on Arista 7050 [201911] Warmboot fails on Arista 7050 Aug 29, 2020
@abdosi
Copy link
Contributor Author

abdosi commented Aug 29, 2020

Looking into fix.

abdosi added a commit to abdosi/sonic-buildimage that referenced this issue Sep 4, 2020
sonic-net#5255

Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.

Ideally check of  Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
abdosi added a commit that referenced this issue Sep 4, 2020
#5255

Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.

Ideally check of  Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
abdosi added a commit that referenced this issue Sep 6, 2020
#5255

Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.

Ideally check of  Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this issue Feb 25, 2021
sonic-net#5255

Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.

Ideally check of  Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant