Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchagent crash on latest SONiC images #458

Open
ciju-juniper opened this issue Sep 3, 2019 · 13 comments
Open

Orchagent crash on latest SONiC images #458

ciju-juniper opened this issue Sep 3, 2019 · 13 comments

Comments

@ciju-juniper
Copy link

Orchagent is crashing on the latest SONiC images. Till August 2'nd, there were no issues. This is the last commit 'c6e442b946d7bb46d7e53d3ce1263d44b0ef3810
' on which things were fine.

Here are a few logs which will help to debug the problem.

admin@sonic:~$ show version

SONiC Software Version: SONiC.master.0-dirty-20190829.210940
Distribution: Debian 9.9
Kernel: 4.9.0-9-2-amd64
Build commit: 3323e9b8
Build date: Thu Aug 29 17:57:57 UTC 2019
Built by: bala@vlinux-5

Platform: x86_64-juniper_qfx5210-r0
HwSKU: Juniper-QFX5210-64C
ASIC: broadcom
Serial Number: YB0217500013
Uptime: 17:00:35 up 3 min, 2 users, load average: 0.62, 0.36, 0.14

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-brcm latest 6c5c47f159ff 392MB
docker-syncd-brcm master.0-dirty-20190829.210940 6c5c47f159ff 392MB
docker-lldp-sv2 latest 57dfcda211c2 298MB
docker-lldp-sv2 master.0-dirty-20190829.210940 57dfcda211c2 298MB
docker-snmp-sv2 latest 8eac3da58656 323MB
docker-snmp-sv2 master.0-dirty-20190829.210940 8eac3da58656 323MB
docker-dhcp-relay latest 27f3f670833a 289MB
docker-dhcp-relay master.0-dirty-20190829.210940 27f3f670833a 289MB
docker-database latest 87420c49d8e3 281MB
docker-database master.0-dirty-20190829.210940 87420c49d8e3 281MB
docker-teamd latest 10693a6d0f14 302MB
docker-teamd master.0-dirty-20190829.210940 10693a6d0f14 302MB
docker-orchagent latest 366c48a62e70 321MB
docker-orchagent master.0-dirty-20190829.210940 366c48a62e70 321MB
docker-fpm-frr latest 7ce2e1efeebd 319MB
docker-fpm-frr master.0-dirty-20190829.210940 7ce2e1efeebd 319MB
docker-sonic-telemetry latest 8b3b68beed1f 304MB
docker-sonic-telemetry master.0-dirty-20190829.210940 8b3b68beed1f 304MB
docker-router-advertiser latest ea0b1d0ddd01 281MB
docker-router-advertiser master.0-dirty-20190829.210940 ea0b1d0ddd01 281MB
docker-platform-monitor latest 5038208af485 326MB
docker-platform-monitor master.0-dirty-20190829.210940 5038208af485 326MB

admin@sonic:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4a2e0dbd8b38 docker-snmp-sv2:latest "/usr/bin/supervisord" 2 minutes ago Up 30 seconds snmp
28c26910047d docker-fpm-frr:latest "/usr/bin/supervisord" 3 minutes ago Up 3 minutes bgp
8c10dab37f55 docker-lldp-sv2:latest "/usr/bin/supervisord" 3 minutes ago Up 3 minutes lldp
ab7c2cedf149 docker-platform-monitor:latest "/usr/bin/docker_ini…" 3 minutes ago Up 3 minutes pmon
591487be163e docker-sonic-telemetry:latest "/usr/bin/supervisord" 3 minutes ago Up 3 minutes telemetry
eff380465b1b docker-database:latest "/usr/local/bin/dock…" 3 minutes ago Up 3 minutes database

Seeing the following message in syslog:

Sep 3 17:00:51.203721 sonic NOTICE swss#orchagent: :- initializePort: Initializing port alias:Ethernet208 pid:1000000000002
Sep 3 17:00:51.204278 sonic ERR syncd#syncd: :- processEvent: failed to execute api: remove, key: SAI_OBJECT_TYPE_PORT:oid:0x1000000000022, status: SAI_STATUS_NOT_SUPPORTED
Sep 3 17:00:51.204363 sonic ERR syncd#syncd: :- syncd_main: Runtime error: :- processEvent: failed to execute api: remove, key: SAI_OBJECT_TYPE_PORT:oid:0x1000000000022, status: SAI_STATUS_NOT_SUPPORTED
Sep 3 17:00:51.204387 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA
Sep 3 17:00:51.204433 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: notification send successfull
Sep 3 17:00:51.204527 sonic NOTICE swss#orchagent: :- handle_switch_shutdown_request: switch shutdown request
Sep 3 17:00:51.204865 sonic INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::invalid_argument'
Sep 3 17:00:51.204865 sonic INFO swss#supervisord: orchagent what(): parse error - unexpected end of input
Sep 3 17:00:51.339267 sonic INFO swss#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...
sonic_dump_sonic_20190903_170341.tar.gz

@ciju-juniper
Copy link
Author

@lguohan Here is the issue that I mentioned to you on last Friday. Please let me know if any further details are needed.

@ciju-juniper
Copy link
Author

ciju-juniper commented Sep 6, 2019

@zhenggen-xu @stcheng @lguohan @wendani @kcudnik

We have narrowed down the problem to a specific commit. Issue observed from below sonic-buildimage commit id onwards:

 commit **6f40933d3d7b9f21a97de275fbd14ea3598d9a0a**
 Author: zhenggen-xu <[email protected]>
 Date:   Wed Aug 7 10:59:54 2019 -0700 
[Feature: DynamicPortBreakout] Use consolidated bcm file for Seastone platform (#3240)

The corresponding sonic-swss commit is:

commit 5be3963793d5d04807931f016faf1fcca87f6286
Author: zhenggen-xu [email protected]
Date: Wed Jul 31 09:05:11 2019 -0700

We took one commit before the above mentioned sonic-buildimage commit id and tested. we didn't observe orchagent crash issue and all the docker containers are running.

    commit **49f3b22de50bceb18c16910163400812135e0fe1**
    Author: simonJi2018 <[email protected]>
    Date:   Thu Aug 8 00:33:56 2019 +0800

    The corresponding sonic-swss commit id is :
     
     commit **63afbd5f0c89de8ce00cf717a266381f0822ce86**
     Author: Volodymyr Samotiy <[email protected]>
     Date:   Mon Jul 22 15:24:55 2019 +0300

In between the sonic-swss has about 10 commits.

Please look in to this issue and let us know if you need any further details.

@kcudnik
Copy link

kcudnik commented Sep 7, 2019

Seems like you want to remove port and vendor Sai returns bot supported

@ciju-juniper
Copy link
Author

@kcudnik There aren't any changes to the configuration. No cable OIR. Issue is happening even without the cables are connected.

This issue is reported by Dell also: sonic-net/sonic-buildimage#3314

Any broadcom based switch will hit this problem.

@kcudnik
Copy link

kcudnik commented Sep 9, 2019

i just concluded that from syslog you pasted:

Sep 3 17:00:51.204278 sonic ERR syncd#syncd: :- processEvent: failed to execute api: remove, key: SAI_OBJECT_TYPE_PORT:oid:0x1000000000022, status: SAI_STATUS_NOT_SUPPORTED
Sep 3 17:00:51.204363 sonic ERR syncd#syncd: :- syncd_main: Runtime error: :- processEvent: failed to execute api: remove, key: SAI_OBJECT_TYPE_PORT:oid:0x1000000000022, status: SAI_STATUS_NOT_SUPPORTED

error is "not supported" on action "remove key: sai object type port" so someone wants to re remove PORT object, and brcm SAI dont support that operation

@ciju-juniper
Copy link
Author

@kcudnik Would you know how to debug this issue?

@habeebmohammed
Copy link

Can you please share your bcm config file?

@habeebmohammed
Copy link

We faced the same problem on our Inventec switches, the problem can be resolved in two ways:

  1. comment out loopback and mgmt ports in bcm config
  2. Add the EP/consolidated config file, you can refer config file in the commit below

commit 6f40933d3d7b9f21a97de275fbd14ea3598d9a0a
Author: zhenggen-xu [email protected]
Date: Wed Aug 7 10:59:54 2019 -0700
[Feature: DynamicPortBreakout] Use consolidated bcm file for Seastone platform (#3240)

I need to check with @ zhenggen-xu [email protected] for more details.

@BaluAlluru
Copy link

BaluAlluru commented Sep 10, 2019

@habeebmohammed , as suggested by you, we commented out the loopback and mgmt ports in bcm config.

We are not seeing the issue now. There are no cores and all the docker containers are running.

#add loopback port
port 33 is the first loopback port
#portmap_33=260:10
port 66 is the first management port
#portmap_66=257:10
port 67 is the second loopback port
#portmap_67=261:10
port 100 is the second management port
#portmap_100=259:10
port 101 is the third loopback port
#portmap_101=262:10
port 135 is the fourth loopback port
#portmap_135=263:10

But ideally with the commit([Feature: DynamicPortBreakout]) this should not have introduced this issue. We still don't know the implications of commenting these lines in bcm config.

@kcudnik
Copy link

kcudnik commented Sep 11, 2019

@kcudnik Would you know how to debug this issue?

from sairedis here there is nothing to debug here, user wants to remove port, but vendor sai don't implement that feature, so there is nothing you can do about it

@ciju-juniper
Copy link
Author

@kcudnik As I said earlier, there is not any user action done.

Also note that after the commit 6f40933d3d7b9f21a97de275fbd14ea3598d9a0a, this problem started to appear. We can go back to the very previous commit and there is not any issue seen there.

@kcudnik
Copy link

kcudnik commented Sep 11, 2019

By user i mean OA, and log you pasted shows that port is being removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants