Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge azure/sonic-swss to aclorch branch #3

Merged
merged 50 commits into from
Oct 2, 2019

Conversation

shine4chen
Copy link
Owner

What I did

Why I did it

How I verified it

Details if related

wendani and others added 30 commits July 23, 2019 22:44
* ARM32 bit fixes, for 64bit printf format specifier

Signed-off-by: Antony Rheneus <[email protected]>
Buffer pool watermark design is built on a READ_AND_CLEAR polling mode at the syncd level. We observe some SAI implementation is missing the clear_stats operation support for buffer pool watermark, either simply not coded yet or having hardware limitation. In such cases, we have mismatch between the actual polling behavior and what a user perceives from FLEX_COUNTER_DB read from FLEX_COUNTER_GROUP_TABLE.

To have a consistent view, we propose per buffer pool watermark stats polling mode at the orchagent level when not all buffer pools support clear_stats operation on a switch. The situation is detected by issuing clear_stats operation at the orchagent level to probe the capability over all pools first. If this is truly the case, we then do not set "STATS_MODE" field in "FLEX_COUNTER_GROUP_TABLE:BUFFER_POOL_WATERMARK_STAT_COUNTER", but set it to the per buffer pool table "FLEX_COUNTER_TABLE:BUFFER_POOL_WATERMARK_STAT_COUNTER:oid:<buffer_pool_oid>".
make explicit failure if team kernel module is not available on the system

Signed-off-by: Guohan Lu <[email protected]>
default image is docker-sonic-vs:latest

Signed-off-by: Guohan Lu <[email protected]>
This is to get a better JUnitXML file and align all
the tests under the same level of hierarchy.

Signed-off-by: Shu0T1an ChenG <[email protected]>
#997)

* Trap DHCPv6 packets for supporting ZTP over in-band interfaces using DHCPv6 discovery

Also increase incoming packet rate on in-band interfaces to support faster
download of large files. SONiC firmware image download over in-band can
take a lot of time if the incoming packet rate is limited to 600pps. This,
change increases it to 6000pps.

Signed-off-by: Rajendra Dendukuri <[email protected]>
Config DB manual is being moved from Wiki to SWSS repo.
Once if this is merged, the page from Wiki will point to this config DB manual.
After that, whichever developer makes changes on config DB, are expected to make the corresponding changes in this manual.
The change includes two parts:

<1> Bring member out of default VLAN 1 upon putting port/lag in a VLAN

<2> Second part was done by @tieguoevan ( https://github.com/tieguoevan) and incorporated here to avoid test error that would follow change <1>.

Summary:
The Bridge interface needs to be up all the time. Otherwise, the command bridge vlan will fail.
Not sure it is a kernel bug, but it cause error when clear all vlan members and reconfigure it.

create a dummy interface in the Bridge to keep it up all the time

Signed-off-by: Jipan Yang <[email protected]>
As the vxlan tunnel_id and tunnel_term_id are not created until the map entrys are added, in case of configuring the vxlan tunnel without map entry, and then it is invalid to remove it without validity checking.

Signed-off-by: sundandan <[email protected]>
Three PRs for adding BGP eoiu support to speed up route reconciliation in fpmsyncd

sonic-buildimage: sonic-net/sonic-buildimage#2823
sonic-swss-common: sonic-net/sonic-swss-common#273
sonic-swss: #856

Why I did it

Similar to restore_neigbors.py for neigborsyncd, start a bgp_eoiu_mark.py for bgp docker.

The script check bgp neighbor state via cli interface periodically (every 1 second)
It looks for explicit EOR and implicit EOR (keep alive after established) in the json output of show ip bgp neighbors A.B.C.D json

Once the script has collected all needed EORs, it set a EOIU flag in stateDB.

fpmsyncd could hold a few seconds (3 seconds) after getting the flag before starting routing reconciliation.

For any reason the script failed to set EOIU flag in stateDB, the current warm_restart bgp_timer will kick in later.

This approach may have a few more seconds delay compared with the FRR embedded EOIU solution, but simple and less risk.

Signed-off-by: Jipan Yang <[email protected]>
add some CRM pytest test cases, the list is shown below for CRM test.

1.Test the mechanism for CRM threshold displaying SWSS_LOG_WARN by syslog checking.
2.Add CRM Acl Group test case.
3.Add the configuration tests of threshold and polling interval.

From: [email protected]
Check app db and state db after create vlan. => succeed
Check app db and state db after add vlan member. => succeed

Signed-off-by: Emma Lin <[email protected]>
…e. (#860)

* [VLAN] Add pytest cases to validate different use-case of tagging_mode.

Signed-off-by: Emma Lin <[email protected]>
Problem: When SONiC CLI command is used to display summary of interface(s), broadcast address is always 0.0.0.0 irrespective of prefix length

Solution: When interface ip address is added using the command "ip addr add ...", we can specify the broadcast address as well. I did NOT set broadcast addr for interface with point-to-point link addresses(/31, and /127)

Signed-off-by: Vasant Patil [email protected]
Monitor the changes in the configuration databaes and update
the corresponding rate/size for the policers

Signed-off-by: Shu0T1an ChenG <[email protected]>
… to vlan. (#875)

* [vlan] Add pytest cases to validate the behavior about add LAG member to
vlan.

Signed-off-by: Emma Lin <[email protected]>
I noticed that after swss restart in VS orchagent does not receive
PortInitDone. I looked at the comment about "g_portSet" says that:

 When this LinkSync class is
 * initialized, we check the database to see if some of the ports' host
 * interfaces are already created and remove them from this set.
However g_portSet was filled after LinkSync is initialized, so I
considered this is a bug causing orchagent does not receive PortInitDone
when portsyncd starts after host interfaces were created.

Signed-off-by: Stepan Blyschak <[email protected]>
* Add a global scope to VNet to consider default VRF

* Add VS test to validate default scope changes
Pterosaur and others added 20 commits August 15, 2019 22:27
The log statement will cause segmentfault if `observerEntry->second.routeTable` is empty.

Signed-off-by: Ze Gan <[email protected]>
…1031)

Increase the stale timer to "600" seconds so it won't be aged
out in case the test server is busy or slow

Signed-off-by: Zhenggen Xu <[email protected]>
* Add default catch block in portsyncd
* Updated error message with the right spelling
* Update try block to throw exception of type runtime_error
* Remove additional paranthesis in throw statement
Adding the table name in the log when creating/removing
the ACL rules.

Signed-off-by: Shu0T1an ChenG <[email protected]>
* Send arp request after first Vlan member port is added

* Add wait logic after Vlan member add, nbrmgr to wait for restore complete

* Address comment to pass db as a parameter and open only once
…#963)


* Add support for egress mirror action

* Move redirect out from PACKET_ACTION to
its own REDIRECT_ACTION key preserving
backwards compatibility with old schema
to be aligned with SAI data types

* Query ACL action list supported by ASIC
per stage and put this information in
STATE DB SWITCH_CAPABILITY table

* perform secondary query for ACL action
attributes which parameters are enum values

* implement VS test cases

Signed-off-by: Stepan Blyschak <[email protected]>
…he route object table value is zero (#1048)

* change in fpmsyncd to  skip the lookup for the Master device name
if the route object table value is zero
.i.e. the route needs to put in the global routing table
* Cannot ping to link-local ipv6 interface address of the switch.

Fixes:
       1. Packets destined to the switch's routing interface link-local ipv6 address
       are not coming to CPU. Hence the ping fails.
       Since all interfaces have the same link-local ipv6 address, all we need is
       a single ip2me /128 route corresponding to this address added in the hardware.
       We don't need fe80 ip2me route added to hardware for every interface. Hence the
       address overlap issue won't arise for the link-local interface address.

       2. Fixed another issue as part of this PR.
       Where the link-local ipv6 neighbors are not learned via netlink by neighsync.
       As a result, we could not add an ipv6 route via link-local nexthop.
       Allow neighsync to learn the link-local neighbors too.

Signed-off-by: [email protected]

* Incremental change to the code changes.

* Incremental change to the code changes.

* Incorporated review comments.

* Incorporated review comments.

* Add fe80::/10 route to CPU to forward all locally destined link-local ipv6 packets to CPU.

* Retain fe80.../128 ip2me route in the hardware along with fe80::/10 subnet route.

Signed-off-by: Kiran Kella <[email protected]>
In ACL combined mode, v4 and v6 rules are sharing the same
physical table while having separated configuration tables.
The daemon needs to use the configuration table name to store
the counter information.

Signed-off-by: Shu0T1an ChenG <[email protected]>
* [MirrorOrch]: Mirror Session Retention across Warm Reboot

After warm reboot, it is expected that the monitor port of
the mirror session is retained - no changing on the monitor
port withint the ECMP group members and the LAG members. This
is due to the general of the sairedis comparison logic and
the minimalization of SAI function calls during reconciliation.

Changes:
1. Add bake() and postBake() functions in MirrorOrch
   bake() function retrieves the state database information
   and get the VLAN + monitor port information.
   postBake() function leverages the information and recovers
   the active mirror sessions the same as before warm reboot.
2. state database format change
   Instead of storing the object ID of the monitor port, store
   the alias of the monitor port.
   Instead of storing true/false of VLAN header, store the VLAN
   ID.

Update: Freeze doTask() function instead of update() function

With this update, we could fix potential orchagent issues before
the warm reboot when the monitor port was wrongly calculated.

Signed-off-by: Shu0T1an ChenG <[email protected]>
Remove deprecated mirror session states in the state database

Signed-off-by: Shu0t1an Cheng <[email protected]>
* Remove nexthop member from nexthopgroup
on detecting portchannel down

* Code cleanup

* Fix spacing errors

* Create new Test
1. Add 4 PortChannels
2. Add to nexthop group

* Check for 3 NH group members
after bringing down a portchannel
@shine4chen shine4chen merged commit b7d6d64 into shine4chen:aclorch Oct 2, 2019
shine4chen pushed a commit that referenced this pull request Feb 3, 2020
* Fix the stack-overflow issue in AclOrch::getTableById()

In case the table_id and m_mirrorTableId/m_mirrorTableId are empty
The function would be called recursively without ending.

```
If we have some rule that does not have table defined, it will trigger this issue.

(gdb) bt
#0  0x00007f77ac5af3a9 in swss::Logger::write (this=0x7f77ac801ec0 <swss::Logger::getInstance()::m_logger>, prio=swss::Logger::SWSS_DEBUG, fmt=0x7f77ac5f1690 ":> %s: enter")
    at logger.cpp:209
#1  0x00000000004a5792 in AclOrch::getTableById (this=this@entry=0x1c722a0, table_id="") at aclorch.cpp:2617
#2  0x00000000004a59d1 in AclOrch::getTableById (this=this@entry=0x1c722a0, table_id="") at aclorch.cpp:2634
#3  0x00000000004a59d1 in AclOrch::getTableById (this=this@entry=0x1c722a0, table_id="") at aclorch.cpp:2634
...
#20944 0x00000000004a59d1 in AclOrch::getTableById (this=this@entry=0x1c722a0, table_id="") at aclorch.cpp:2634
#20945 0x00000000004ad3ce in AclOrch::doAclRuleTask (this=this@entry=0x1c722a0, consumer=...) at aclorch.cpp:2437
#20946 0x00000000004b04cd in AclOrch::doTask (this=0x1c722a0, consumer=...) at aclorch.cpp:2141
#20947 0x00000000004231b2 in Orch::doTask (this=0x1c722a0) at orch.cpp:369
#20948 0x000000000041c4e9 in OrchDaemon::start (this=this@entry=0x1c19960) at orchdaemon.cpp:376
#20949 0x0000000000409ffc in main (argc=<optimized out>, argv=0x7ffe2e392d68) at main.cpp:295

(gdb) p table_id
$1 = ""
(gdb) p m_mirrorTableId
$2 = ""
(gdb) p m_mirrorV6TableId
$3 = ""
```

Signed-off-by: Zhenggen Xu <[email protected]>
shine4chen pushed a commit that referenced this pull request Feb 21, 2024
Currently, ASAN sometimes reports the BufferOrch::m_buffer_type_maps and QosOrch::m_qos_maps as leaked. However, their lifetime is the lifetime of a process so they are not really 'leaked'.
This also adds a simple way to add more suppressions later if required.

Example of ASAN report:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da9f789 in __static_initialization_and_destruction_0 /__w/2/s/orchagent/bufferorch.cpp:39
    #2 0x55ca1daa02af in _GLOBAL__sub_I_bufferorch.cpp /__w/2/s/orchagent/bufferorch.cpp:1321
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da6d2da in __static_initialization_and_destruction_0 /__w/2/s/orchagent/qosorch.cpp:80
    #2 0x55ca1da6ecf2 in _GLOBAL__sub_I_qosorch.cpp /__w/2/s/orchagent/qosorch.cpp:2000
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

- What I did
Added an lsan suppression config with static variable leak suppression

- Why I did it
To suppress ASAN false positives

- How I verified it
Run a test that produces the static variable leaks report and checked that report has these leaks suppressed.

Signed-off-by: Yakiv Huryk <[email protected]>
shine4chen pushed a commit that referenced this pull request Feb 21, 2024
**What I did**

Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers

**Why I did it**

```
Indirect leak of 83776 byte(s) in 476 object(s) allocated from:
    #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115
    #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584
    #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634
    sonic-net#5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461
    sonic-net#6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501
    sonic-net#7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450
    sonic-net#8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763
    sonic-net#9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179
    sonic-net#10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241
    sonic-net#11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238
    sonic-net#12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235
    sonic-net#13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755
    sonic-net#14 0x555558e9be25 in main orchagent/main.cpp:766
    sonic-net#15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.