Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To delete neighbor entries which are next-hop of routing entries from NEIGH_TABLE causes DEL operations pending in the neighorch::m_toSync #4400

Closed
stephenxs opened this issue Apr 8, 2020 · 10 comments

Comments

@stephenxs
Copy link
Collaborator

stephenxs commented Apr 8, 2020

Description
To delete neighbor entries which are next-hop of routing entries from NEIGH_TABLE causes DEL operations pending in the neighorch::m_toSync
It can be reproduced in any scenario where the kernel ages/removes a neighbor entry (ARP or ND) who is the next-hop of a routing entry. An example is the warm reboot test: after warm reboot orchagent receives neigh remove from kernel/net link.
This issue can prevent orchagent from frozen for the next reboot.

I suspect even in the 201911 branch the kernel will notify the removal of the neighbors which are the next-hop of the routing entries. But the issue isn't observed in 201911 branch because

  • there is a logic of merging a DEL operation followed by a SET operation if both the DEL and SET operate the same key (orch::addToSync)
  • the neighbor entries with non-zero reference count remain in the m_toSync when handling the DEL operation
  • the DEL operation is removed due to the above logic when the subsequent SET operations on the same key come

However, this logic has been updated by PR #1184. IMO the logic implemented by PR 1184 is better because it provides the ability to remove and recreate an object, which is necessary when we want to modify a create-only attribute of an object. So I suggest improving the neighorch implemention.

Currently, the notification of removing a neighbor with non-zero reference will remain in m_toSync. I think this is based on an assumption that the reference will be decreased to zero in a short time. However, I'm afraid the assumption is not correct because there is no guarantee that the kernel won't age an ARP entry which is a next-hop of some routing entries. In this case, the notification can remain in m_toSync for a long time.
Is it possible to handle this scenario in the following way?

  • add a PENDING_REMOVE flag to the neighbor entry for which a removing notification is received when the reference count isn't zero and remove it from m_toSync.
  • if an updating notification received for a flagged ARP entry, just remove the flag.
  • if the referencing count of the ARP entry is decreased to zero, execute the removing procedure, removing the entry from ASIC via calling the SAI API.

Steps to reproduce the issue:

  1. Deploy SONiC switch in t0 topo
  2. Run warm-reboot test (advanced-reboot)
  3. After test finished, try warm reboot the DUT again.
  4. Adjust log level of orchagent to INFO, neighbor removing logs can be observed.

Describe the results you received:
After warm-reboot test, try warm-reboot for the second time, error message logged

admin@r-tigris-04:~$ sudo warm-reboot 
RESTARTCHECK failed

Adjust orchagent's log level to INFO, the following logs are found:

Apr  8 10:32:01.637562 r-tigris-04 NOTICE swss#orchagent: :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS
Apr  8 10:32:01.698064 r-tigris-04 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:5c:9d:d5 on PortChannel0001
Apr  8 10:32:01.698064 r-tigris-04 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:43:97:5f on PortChannel0002
Apr  8 10:32:01.698064 r-tigris-04 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:9b:07:af on PortChannel0003
Apr  8 10:32:01.698064 r-tigris-04 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:fc:fe:74 on PortChannel0004

However, the neighbors are up and portchannels aren't down after warm-reboot. Once I built an image with additional debug info and found neighbor removing messages are received from netlink socket, which means they're removed by kernel.

Describe the results you expected:
The orchagent shouldn't receive BGP neighbors removing message.

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

admin@r-tigris-04:~$ show version

SONiC Software Version: SONiC.HEAD.0-dirty-20200407.121233
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 249265ad
Build date: Tue Apr  7 06:33:17 UTC 2020
Built by: kebol@arc-build-server-2

Platform: x86_64-mlnx_msn3800-r0
HwSKU: Mellanox-SN3800-D112C8
ASIC: mellanox
Serial Number: MT1925X00008
Uptime: 10:30:09 up 16 min,  1 user,  load average: 2.07, 2.36, 1.87

Docker images:
REPOSITORY                    TAG                            IMAGE ID            SIZE
docker-syncd-mlnx             HEAD.0-dirty-20200407.121233   a8c51f1141a3        383MB
docker-syncd-mlnx             latest                         a8c51f1141a3        383MB
docker-router-advertiser      HEAD.0-dirty-20200407.121233   c228771a23d4        283MB
docker-router-advertiser      latest                         c228771a23d4        283MB
docker-platform-monitor       HEAD.0-dirty-20200407.121233   86fabb434565        628MB
docker-platform-monitor       latest                         86fabb434565        628MB
docker-fpm-frr                HEAD.0-dirty-20200407.121233   d5dbc9f4988b        327MB
docker-fpm-frr                latest                         d5dbc9f4988b        327MB
docker-lldp-sv2               HEAD.0-dirty-20200407.121233   dac0f386ea2d        304MB
docker-lldp-sv2               latest                         dac0f386ea2d        304MB
docker-dhcp-relay             HEAD.0-dirty-20200407.121233   757410483e93        293MB
docker-dhcp-relay             latest                         757410483e93        293MB
docker-database               HEAD.0-dirty-20200407.121233   ac7684a5969e        283MB
docker-database               latest                         ac7684a5969e        283MB
docker-sflow                  HEAD.0-dirty-20200407.121233   0490a8ade857        308MB
docker-sflow                  latest                         0490a8ade857        308MB
docker-teamd                  HEAD.0-dirty-20200407.121233   b3242c1f1f3f        307MB
docker-teamd                  latest                         b3242c1f1f3f        307MB
docker-snmp-sv2               HEAD.0-dirty-20200407.121233   b197a084c2b2        340MB
docker-snmp-sv2               latest                         b197a084c2b2        340MB
docker-orchagent              HEAD.0-dirty-20200407.121233   422bb10beebe        326MB
docker-orchagent              latest                         422bb10beebe        326MB
docker-nat                    HEAD.0-dirty-20200407.121233   a5ca9f43bed4        309MB
docker-nat                    latest                         a5ca9f43bed4        309MB
docker-sonic-mgmt-framework   HEAD.0-dirty-20200407.121233   1713225fe846        421MB
docker-sonic-mgmt-framework   latest                         1713225fe846        421MB
docker-sonic-telemetry        HEAD.0-dirty-20200407.121233   d474172e495b        344MB
docker-sonic-telemetry        latest                         d474172e495b        344MB
**Attach debug file `sudo generate_dump`:**

sonic_dump_r-tigris-04_20200408_103340.tar.gz
test-record.log
lacp-state-on-vm.log

```
(paste your output here)
```
@xinliu-seattle
Copy link
Contributor

please take this to the warm reboot subgroup meeting and discuss there.

@stephenxs
Copy link
Collaborator Author

This issue appeared recently.
sonic_dump_mtbc-sonic-01-2410_20200923_104248.tar.gz

@stephenxs
Copy link
Collaborator Author

Hi @lguohan
I encountered this issue when I was testing warm reboot from 201911 to dynamic buffer which is based on master.
So I can't say for sure whether it appears in 201911.

@stephenxs
Copy link
Collaborator Author

After warm reboot test finished, the neighbors were somehow removed and then learned approximately 1-second latter.
I added some debug in neighsyncd and saw the following log which indicated the arp entries were removed for each of the IPv4 neighbor learnt from port channel.

Sep 23 10:37:13.795126 mtbc-sonic-01-2410 INFO swss#neighsyncd: :- onMsg: ND-TEST RECEIVING NLMSG REMOVE neighbour PortChannel0002:10.0.0.59 none

and the following messages indicating the entries were learned

Sep 23 10:37:14.081397 mtbc-sonic-01-2410 INFO swss#neighsyncd: :- onMsg: ND-TEST RECEIVING NLMSG UPDATE neighbour PortChannel0002:10.0.0.59 52:54:00:2a:03:ab

And then the following messages observed:

Sep 23 10:37:14.123964 mtbc-sonic-01-2410 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:64:0c:a4 on PortChannel0001
Sep 23 10:37:14.124029 mtbc-sonic-01-2410 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:2a:03:ab on PortChannel0002
Sep 23 10:37:14.124096 mtbc-sonic-01-2410 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:8e:01:fb on PortChannel0003
Sep 23 10:37:14.124162 mtbc-sonic-01-2410 INFO swss#orchagent: :- removeNeighbor: Failed to remove still referenced neighbor 52:54:00:53:c7:ea on PortChannel0004

This means the neighbor removing notifications can not be handled and will remain in m_toSync, which will prevent the orchagent from being prepared for the next warm reboot.
Currently, the notification of removing a neighbor with non-zero reference will remain in m_toSync. I think this is based on an assumption that the reference will be decreased to zero in a short time. However, I'm afraid the assumption is not correct because there is no guarantee that the kernel won't age an ARP entry which is a next-hop of some routing entries. In this case, the notification can remain in m_toSync for a long time.
Is it possible to handle this scenario in the following way?

  • add a PENDING_REMOVE flag to the neighbor entry for which a removing notification is received when the reference count isn't zero and remove it from m_toSync.
  • if an updating notification received for a flagged ARP entry, just remove the flag.
  • if the referencing count of the ARP entry is decreased to zero, execute the removing procedure, removing the entry from ASIC via calling the SAI API.

@stephenxs
Copy link
Collaborator Author

I suspect even in the 201911 branch the kernel will notify the removal of the neighbors which are the next-hop of the routing entries. But the issue isn't observed in 201911 branch because

  • there is a logic of merging a DEL operation followed by a SET operation if both the DEL and SET operate the same key (orch::addToSync)
  • the neighbor entries with non-zero reference count remain in the m_toSync when handling the DEL operation
  • the DEL operation is removed due to the above logic when the subsequent SET operations on the same key come

However, this logic has been updated by PR #1184, which causes the issue.

IMO the logic implemented by PR 1184 is better because it provides the ability to remove and recreate an object, which is necessary when we want to modify a create-only attribute of an object.

So I suggest improving the neighorch design by improving the reference count mechanism as above mentioned.

@shi-su
Copy link
Contributor

shi-su commented Oct 22, 2020

I experienced exactly the same issue in the master image. I found it happens and blocks the following warm-reboot as long as SONiC boots up (no matter it is cold/fast/warm) with the ping traffic in the warm/fast reboot test. And the logic in sonic-net/sonic-swss#1184 makes the DEL operation stay in m_toSync. So I think this might not completely be a warm-reboot problem, it just happens to block warm-reboot. There would be a DEL operation stuck in m_toSync and keep retrying even if there is no warm-reboot attempts following.

@stephenxs
Copy link
Collaborator Author

You’re correct. This issue can also be reproduced without warmreboot.
I m going to update the title

@stephenxs stephenxs changed the title After warm reboot orchagent receives neigh remove from kernel/net link, which prevents orchagent from frozen for the next reboot To delete neighbor entries which are next-hop of routing entries from NEIGH_TABLE causes DEL operations pending in the neighorch::m_toSync Oct 23, 2020
@stephenxs
Copy link
Collaborator Author

The title and description have been updated.
The lable Issues for 201911 has also been removed since this is a master issue and can't be reproduced on 201911.

@yxieca
Copy link
Contributor

yxieca commented Nov 3, 2020

@stephenxs can you validate and close the issue?

@stephenxs
Copy link
Collaborator Author

Hi @yxieca ,
With PR #1485 merged, this issue has been fixed.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants