A small collection of patches that fix issues found by valgrind #1

donaldsharp · 2016-12-16T02:32:30Z

This cleanup clears a bgp crash on shutdown and some other minor valgrind issues.

There exists a possibility that when we cleanup for shutdown that we may attempt to access them again. Found via valgrind, stopped showing up in there. Signed-off-by: Donald Sharp <[email protected]>

Valgrind found this issue. This cleans it up from happening. Signed-off-by: Donald Sharp <[email protected]>

The first time through calling 'show ip bgp summary' we were always calculating the variable hostname field size incorrectly. Signed-off-by: Donald Sharp <[email protected]>

…rminated (BUFFER_SIZE_WARNING) Coverity: buffer_size_warning: Calling strncpy with a maximum size argument of 100 bytes on destination array pid_file of size 100 bytes might leave the destination string unterminated. Signed-off-by: Martin Winter <[email protected]>

… too small (BUFFER_SIZE) Coverity: buffer_size: You might overrun the 108 byte destination string addr.sun_path by writing the maximum 4095 bytes from path. Signed-off-by: Martin Winter <[email protected]>

…TCH) Needs to be size of correct structure (prefix instead of prefix_ipv4) Signed-off-by: Martin Winter <[email protected]>

…ERFLOW) Coverity: string_overflow: You might overrun the 100-character destination string vty_path by writing 4096 characters from vty_sock_path. Signed-off-by: Martin Winter <[email protected]>

…eset * rip_interface.c: Default for split_horizon_default differed between rip_interface_new and rip_interface_reset, causing at least some issues after interface events. See patchwork FRRouting#604. Fix, and consolidate code. (rip_interface_{reset,clean}) rename these to 'interface', as that's more appropriate. Spin the ri specific bodies of these functions out to rip_interface_{reset,clean} helpers. Factor out the overlaps, so rip_interface_reset uses rip_interface_clean. (rip_interface_new) just use rip_interface_reset. * ripd.h: Update for (rip_interface_{reset,clean}) Reported by xufeng zhang, with a suggested fix on which this commit expands. See patchwork FRRouting#604. This commit addresses only the split-horizon discrepency, issue #2. The other issue they reported, #1, is not addressed, though suggested fix seems inappropriate. Cc: [email protected]

…7-whitespace2 to master-frr-upstream-sync-2017-07 * commit '87122d314c90f2e023b5fcebe514a1ddc2a59eb9': (21 commits) Remove FRR-hacking.md documentation Add OSPF API and FRR Hacking documents ospf6d: crash in ospf6_lsdb_show bgpd: fix peer startup for labeled-unicast if linklocal address not found replace space to tabs, add kernel styles multiline, remove trailing whitespaces. Add note about bridge limitations whitespace internal 2 internal reindent lib: route_node_lookup() needs to apply_mask() to prefix Add 1 more identation to correspond to kernel style multi-line comment ospf6d: crash in ospf6_lsdb_show bgpd: fix peer startup for labeled-unicast if linklocal address not found replace space to tabs, add kernel styles multiline, remove trailing whitespaces. *: reindent pt. 2 Add note about bridge limitations eigrpd: remove last vty_outln *: reindent *: add indent control files Remove FRR-hacking.md documentation Add OSPF API and FRR Hacking documents ...

Signed-off-by: Daniel Walton <[email protected]> Before ====== cel-redxp-10# show ip bgp 20.1.3.0/24 BGP routing table entry for 20.1.3.0/24 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to non peer-group peers: top1(10.1.1.2) bottom0(20.1.2.2) 4294967292 20.1.2.2 from bottom0(20.1.2.2) (20.1.1.1) Origin IGP, metric 0, localpref 100, valid, external, bestpath-from-AS -4, best Community: 99:1 AddPath ID: RX 0, TX 92 Last update: Wed Sep 27 16:02:34 2017 cel-redxp-10# After ===== cel-redxp-10# show ip bgp 20.1.3.0/24 BGP routing table entry for 20.1.3.0/24 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to non peer-group peers: bottom0(20.1.2.2) 4294967292 20.1.2.2 from bottom0(20.1.2.2) (20.1.1.1) Origin IGP, metric 0, localpref 100, valid, external, bestpath-from-AS 4294967292, best Community: 99:1 AddPath ID: RX 0, TX 2 Last update: Wed Sep 27 16:07:09 2017 cel-redxp-10#

Level 2 adjacency list is not supposed to be always set. > #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f9f0353274f in core_handler (signo=6, siginfo=0x7ffe95260770, context=0x7ffe95260640) at lib/sigevent.c:258 > #2 <signal handler called> > #3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #4 0x00007f9f0324e537 in __GI_abort () at abort.c:79 > #5 0x00007f9f035744ea in _zlog_assert_failed (xref=0x7f9f0362c6c0 <_xref.15>, extra=0x0) at lib/zlog.c:789 > #6 0x00007f9f034d25ee in listnode_head (list=0x0) at lib/linklist.c:316 > #7 0x000055cd65aaa481 in lib_interface_state_isis_adjacencies_adjacency_get_next (args=0x7ffe95261730) at isisd/isis_nb_state.c:101 > #8 0x00007f9f034feadd in nb_callback_get_next (nb_node=0x55cd673c0190, parent_list_entry=0x55cd67570d30, list_entry=0x55cd6758f8a0) at lib/northbound.c:1748 > #9 0x00007f9f0350bf07 in __walk (ys=0x55cd675782b0, is_resume=false) at lib/northbound_oper.c:1264 > #10 0x00007f9f0350deaa in nb_op_walk_start (ys=0x55cd675782b0) at lib/northbound_oper.c:1741 > #11 0x00007f9f0350e079 in nb_oper_iterate_legacy (xpath=0x55cd67595c60 "/frr-interface:lib", translator=0x0, flags=0, cb=0x0, cb_arg=0x0, tree=0x7ffe952621b0) at lib/northbound_oper.c:1803 > #12 0x00007f9f03507661 in show_yang_operational_data_magic (self=0x7f9f03634a80 <show_yang_operational_data_cmd>, vty=0x55cd675a61f0, argc=4, argv=0x55cd6758eab0, > xpath=0x55cd67595c60 "/frr-interface:lib", json=0x0, xml=0x0, translator_family=0x0, with_config=0x0) at lib/northbound_cli.c:1576 > #13 0x00007f9f035037f0 in show_yang_operational_data (self=0x7f9f03634a80 <show_yang_operational_data_cmd>, vty=0x55cd675a61f0, argc=4, argv=0x55cd6758eab0) > at ./lib/northbound_cli_clippy.c:906 > #14 0x00007f9f0349435d in cmd_execute_command_real (vline=0x55cd6758e490, vty=0x55cd675a61f0, cmd=0x0, up_level=0) at lib/command.c:1003 > #15 0x00007f9f03494477 in cmd_execute_command (vline=0x55cd67585340, vty=0x55cd675a61f0, cmd=0x0, vtysh=0) at lib/command.c:1053 > #16 0x00007f9f03494a0c in cmd_execute (vty=0x55cd675a61f0, cmd=0x55cd67579040 "do show yang operational-data /frr-interface:lib", matched=0x0, vtysh=0) at lib/command.c:1228 > #17 0x00007f9f0355239d in vty_command (vty=0x55cd675a61f0, buf=0x55cd67579040 "do show yang operational-data /frr-interface:lib") at lib/vty.c:625 > #18 0x00007f9f03554136 in vty_execute (vty=0x55cd675a61f0) at lib/vty.c:1388 > #19 0x00007f9f0355634c in vtysh_read (thread=0x7ffe952647a0) at lib/vty.c:2400 > #20 0x00007f9f0354b6f6 in event_call (thread=0x7ffe952647a0) at lib/event.c:1996 > #21 0x00007f9f034d1365 in frr_run (master=0x55cd67204da0) at lib/libfrr.c:1231 > #22 0x000055cd65a3236e in main (argc=7, argv=0x7ffe952649c8, envp=0x7ffe95264a08) at isisd/isis_main.c:354 Fixes: 2a1c520 ("isisd: split northbound callbacks into multiple files") Signed-off-by: Louis Scalbert <[email protected]>

Fix a crash when modifying a route-map with set as-path exclude without as-path-access-list: > router(config)# route-map routemaptest deny 1 > router(config-route-map)# set as-path exclude 33 34 35 > router(config-route-map)# set as-path exclude as-path-access-list test > #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007fb3959327de in core_handler (signo=11, siginfo=0x7ffd122da530, context=0x7ffd122da400) at lib/sigevent.c:258 > #2 <signal handler called> > #3 0x000055ab2762a1bd in as_list_list_del (h=0x55ab27897680 <as_exclude_list_orphan>, item=0x55ab28204e20) at ./bgpd/bgp_aspath.h:77 > #4 0x000055ab2762d1a8 in as_exclude_remove_orphan (ase=0x55ab28204e20) at bgpd/bgp_aspath.c:1574 > #5 0x000055ab27550538 in route_aspath_exclude_free (rule=0x55ab28204e20) at bgpd/bgp_routemap.c:2366 > #6 0x00007fb39591f00c in route_map_rule_delete (list=0x55ab28203498, rule=0x55ab28204170) at lib/routemap.c:1357 > #7 0x00007fb39591f87c in route_map_add_set (index=0x55ab28203460, set_name=0x55ab276ad2aa "as-path exclude", set_arg=0x55ab281e4f70 "as-path-access-list test") at lib/routemap.c:1674 > #8 0x00007fb39591d3f3 in generic_set_add (index=0x55ab28203460, command=0x55ab276ad2aa "as-path exclude", arg=0x55ab281e4f70 "as-path-access-list test", errmsg=0x7ffd122db870 "", > errmsg_len=8192) at lib/routemap.c:533 > #9 0x000055ab2755e78e in lib_route_map_entry_set_action_rmap_set_action_exclude_as_path_modify (args=0x7ffd122db290) at bgpd/bgp_routemap_nb_config.c:2427 > #10 0x00007fb3958fe417 in nb_callback_modify (context=0x55ab28205aa0, nb_node=0x55ab27cb31e0, event=NB_EV_APPLY, dnode=0x55ab28202690, resource=0x55ab27c32148, errmsg=0x7ffd122db870 "", > errmsg_len=8192) at lib/northbound.c:1538 > #11 0x00007fb3958ff0ab in nb_callback_configuration (context=0x55ab28205aa0, event=NB_EV_APPLY, change=0x55ab27c32110, errmsg=0x7ffd122db870 "", errmsg_len=8192) at lib/northbound.c:1888 > #12 0x00007fb3958ff5e4 in nb_transaction_process (event=NB_EV_APPLY, transaction=0x55ab28205aa0, errmsg=0x7ffd122db870 "", errmsg_len=8192) at lib/northbound.c:2016 > #13 0x00007fb3958fddba in nb_candidate_commit_apply (transaction=0x55ab28205aa0, save_transaction=true, transaction_id=0x0, errmsg=0x7ffd122db870 "", errmsg_len=8192) > at lib/northbound.c:1356 > #14 0x00007fb3958fdef0 in nb_candidate_commit (context=..., candidate=0x55ab27c2c9a0, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7ffd122db870 "", errmsg_len=8192) > at lib/northbound.c:1389 > #15 0x00007fb3959045ba in nb_cli_classic_commit (vty=0x55ab281f6680) at lib/northbound_cli.c:57 > #16 0x00007fb395904b5a in nb_cli_apply_changes_internal (vty=0x55ab281f6680, xpath_base=0x7ffd122dfd10 "/frr-route-map:lib/route-map[name='routemaptest']/entry[sequence='1']", > clear_pending=false) at lib/northbound_cli.c:184 > #17 0x00007fb395904ebf in nb_cli_apply_changes (vty=0x55ab281f6680, xpath_base_fmt=0x0) at lib/northbound_cli.c:240 > --Type <RET> for more, q to quit, c to continue without paging-- > #18 0x000055ab27557d2e in set_aspath_exclude_access_list_magic (self=0x55ab2775c300 <set_aspath_exclude_access_list_cmd>, vty=0x55ab281f6680, argc=5, argv=0x55ab28204c80, > as_path_filter_name=0x55ab28202040 "test") at bgpd/bgp_routemap.c:6397 > #19 0x000055ab2754bdea in set_aspath_exclude_access_list (self=0x55ab2775c300 <set_aspath_exclude_access_list_cmd>, vty=0x55ab281f6680, argc=5, argv=0x55ab28204c80) > at ./bgpd/bgp_routemap_clippy.c:856 > #20 0x00007fb39589435d in cmd_execute_command_real (vline=0x55ab281e61f0, vty=0x55ab281f6680, cmd=0x0, up_level=0) at lib/command.c:1003 > #21 0x00007fb3958944be in cmd_execute_command (vline=0x55ab281e61f0, vty=0x55ab281f6680, cmd=0x0, vtysh=0) at lib/command.c:1062 > #22 0x00007fb395894a0c in cmd_execute (vty=0x55ab281f6680, cmd=0x55ab28200f20 "set as-path exclude as-path-access-list test", matched=0x0, vtysh=0) at lib/command.c:1228 > #23 0x00007fb39595242c in vty_command (vty=0x55ab281f6680, buf=0x55ab28200f20 "set as-path exclude as-path-access-list test") at lib/vty.c:625 > #24 0x00007fb3959541c5 in vty_execute (vty=0x55ab281f6680) at lib/vty.c:1388 > #25 0x00007fb3959563db in vtysh_read (thread=0x7ffd122e2bb0) at lib/vty.c:2400 > #26 0x00007fb39594b785 in event_call (thread=0x7ffd122e2bb0) at lib/event.c:1996 > #27 0x00007fb3958d1365 in frr_run (master=0x55ab27b56d70) at lib/libfrr.c:1231 > #28 0x000055ab2747f1cc in main (argc=3, argv=0x7ffd122e2e08) at bgpd/bgp_main.c:555 Fixes: 094dcc3 ("bgpd: fix "bgp as-pah access-list" with "set aspath exclude" set/unset issues") Signed-off-by: Louis Scalbert <[email protected]>

When 'no rpki' is requested and the rtrlib RPKI object was freed, bgpd is crashing. RPKI is configured in VRF red. > ip l set red down > ip l del red > printf 'conf\n vrf red\n no rpki' | vtysh > Core was generated by `/usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:44 > 44 ./nptl/pthread_kill.c: No such file or directory. > [Current thread is 1 (Thread 0x7fb401f419c0 (LWP 190226))] > (gdb) bt > #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:44 > #1 __pthread_kill_internal (signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:78 > #2 __GI___pthread_kill (threadid=140411103615424, signo=signo@entry=11) at ./nptl/pthread_kill.c:89 > #3 0x00007fb4021ad476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26 > #4 0x00007fb4025ce22b in core_handler (signo=11, siginfo=0x7fff831b2d70, context=0x7fff831b2c40) at lib/sigevent.c:248 > #5 <signal handler called> > #6 rtr_mgr_remove_group (config=0x55fe8789f750, preference=11) at /build/make-pkg/output/source/DIST_RTRLIB/rtrlib/rtrlib/rtr_mgr.c:607 > #7 0x00007fb40145f518 in rpki_delete_all_cache_nodes (rpki_vrf=0x55fe8789f4f0) at bgpd/bgp_rpki.c:442 > #8 0x00007fb401463098 in no_rpki_magic (self=0x7fb40146bba0 <no_rpki_cmd>, vty=0x55fe877f5130, argc=2, argv=0x55fe877fccd0) at bgpd/bgp_rpki.c:1732 > #9 0x00007fb40145c09a in no_rpki (self=0x7fb40146bba0 <no_rpki_cmd>, vty=0x55fe877f5130, argc=2, argv=0x55fe877fccd0) at ./bgpd/bgp_rpki_clippy.c:37 > #10 0x00007fb402527abc in cmd_execute_command_real (vline=0x55fe877fd150, vty=0x55fe877f5130, cmd=0x0, up_level=0) at lib/command.c:984 > #11 0x00007fb402527c35 in cmd_execute_command (vline=0x55fe877fd150, vty=0x55fe877f5130, cmd=0x0, vtysh=0) at lib/command.c:1043 > #12 0x00007fb4025281e5 in cmd_execute (vty=0x55fe877f5130, cmd=0x55fe877fb8c0 "no rpki\n", matched=0x0, vtysh=0) at lib/command.c:1209 > #13 0x00007fb4025f0aed in vty_command (vty=0x55fe877f5130, buf=0x55fe877fb8c0 "no rpki\n") at lib/vty.c:615 > #14 0x00007fb4025f2a11 in vty_execute (vty=0x55fe877f5130) at lib/vty.c:1378 > #15 0x00007fb4025f513d in vtysh_read (thread=0x7fff831b5fa0) at lib/vty.c:2373 > #16 0x00007fb4025e9611 in event_call (thread=0x7fff831b5fa0) at lib/event.c:2011 > #17 0x00007fb402566976 in frr_run (master=0x55fe871a14a0) at lib/libfrr.c:1212 > #18 0x000055fe857829fa in main (argc=9, argv=0x7fff831b6218) at bgpd/bgp_main.c:549 Fixes: 8156765 ("bgpd: Add `no rpki` command") Signed-off-by: Louis Scalbert <[email protected]> Signed-off-by: Donatas Abraitis <[email protected]>

The following causes a isisd crash. > # cat config > affinity-map green bit-position 0 > router isis 1 > flex-algo 129 > affinity exclude-any green > # vtysh -f config > #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50 > FRRouting#1 0x00007f650cd32756 in core_handler (signo=6, siginfo=0x7ffc56f93070, context=0x7ffc56f92f40) at lib/sigevent.c:258 > FRRouting#2 <signal handler called> > FRRouting#3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > FRRouting#4 0x00007f650c91c537 in __GI_abort () at abort.c:79 > FRRouting#5 0x00007f650cd007c9 in nb_running_get_entry_worker (dnode=0x0, xpath=0x0, abort_if_not_found=true, rec_search=true) at lib/northbound.c:2531 > FRRouting#6 0x00007f650cd007f9 in nb_running_get_entry (dnode=0x55d9ad406e00, xpath=0x0, abort_if_not_found=true) at lib/northbound.c:2537 > FRRouting#7 0x000055d9ab302248 in isis_instance_flex_algo_affinity_set (args=0x7ffc56f947a0, type=2) at isisd/isis_nb_config.c:2998 > FRRouting#8 0x000055d9ab3027c0 in isis_instance_flex_algo_affinity_exclude_any_create (args=0x7ffc56f947a0) at isisd/isis_nb_config.c:3155 > FRRouting#9 0x00007f650ccfe284 in nb_callback_create (context=0x7ffc56f94d20, nb_node=0x55d9ad28b540, event=NB_EV_VALIDATE, dnode=0x55d9ad406e00, resource=0x0, errmsg=0x7ffc56f94de0 "", > errmsg_len=8192) at lib/northbound.c:1487 > FRRouting#10 0x00007f650ccff067 in nb_callback_configuration (context=0x7ffc56f94d20, event=NB_EV_VALIDATE, change=0x55d9ad406d40, errmsg=0x7ffc56f94de0 "", errmsg_len=8192) at lib/northbound.c:1884 > FRRouting#11 0x00007f650ccfda31 in nb_candidate_validate_code (context=0x7ffc56f94d20, candidate=0x55d9ad20d710, changes=0x7ffc56f94d38, errmsg=0x7ffc56f94de0 "", errmsg_len=8192) > at lib/northbound.c:1246 > FRRouting#12 0x00007f650ccfdc67 in nb_candidate_commit_prepare (context=..., candidate=0x55d9ad20d710, comment=0x0, transaction=0x7ffc56f94da0, skip_validate=false, ignore_zero_change=false, > errmsg=0x7ffc56f94de0 "", errmsg_len=8192) at lib/northbound.c:1317 > FRRouting#13 0x00007f650ccfdec4 in nb_candidate_commit (context=..., candidate=0x55d9ad20d710, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7ffc56f94de0 "", errmsg_len=8192) > at lib/northbound.c:1381 > FRRouting#14 0x00007f650cd045ba in nb_cli_classic_commit (vty=0x55d9ad3f7490) at lib/northbound_cli.c:57 > FRRouting#15 0x00007f650cd04749 in nb_cli_pending_commit_check (vty=0x55d9ad3f7490) at lib/northbound_cli.c:96 > FRRouting#16 0x00007f650cc94340 in cmd_execute_command_real (vline=0x55d9ad3eea10, vty=0x55d9ad3f7490, cmd=0x0, up_level=0) at lib/command.c:1000 > FRRouting#17 0x00007f650cc94599 in cmd_execute_command (vline=0x55d9ad3eea10, vty=0x55d9ad3f7490, cmd=0x0, vtysh=0) at lib/command.c:1080 > FRRouting#18 0x00007f650cc94a0c in cmd_execute (vty=0x55d9ad3f7490, cmd=0x55d9ad401d30 "XFRR_end_configuration", matched=0x0, vtysh=0) at lib/command.c:1228 > FRRouting#19 0x00007f650cd523a4 in vty_command (vty=0x55d9ad3f7490, buf=0x55d9ad401d30 "XFRR_end_configuration") at lib/vty.c:625 > FRRouting#20 0x00007f650cd5413d in vty_execute (vty=0x55d9ad3f7490) at lib/vty.c:1388 > FRRouting#21 0x00007f650cd56353 in vtysh_read (thread=0x7ffc56f99370) at lib/vty.c:2400 > FRRouting#22 0x00007f650cd4b6fd in event_call (thread=0x7ffc56f99370) at lib/event.c:1996 > FRRouting#23 0x00007f650ccd1365 in frr_run (master=0x55d9ad103cf0) at lib/libfrr.c:1231 > FRRouting#24 0x000055d9ab29036e in main (argc=2, argv=0x7ffc56f99598, envp=0x7ffc56f995b0) at isisd/isis_main.c:354 Configuring the same in vtysh configure interactive mode works properly. When using "vtysh -f", the northbound compatible configuration is committed together whereas, in interactive mode, it committed line by line. In the first situation, in validation state nb_running_get_entry() fails because the area not yet in running. Do not use nb_running_get_entry() northbound validation state. Fixes: 893882e ("isisd: add isis flex-algo configuration backend") Signed-off-by: Louis Scalbert <[email protected]>

Level 2 adjacency list is not supposed to be always set. > #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50 > FRRouting#1 0x00007f9f0353274f in core_handler (signo=6, siginfo=0x7ffe95260770, context=0x7ffe95260640) at lib/sigevent.c:258 > FRRouting#2 <signal handler called> > FRRouting#3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > FRRouting#4 0x00007f9f0324e537 in __GI_abort () at abort.c:79 > FRRouting#5 0x00007f9f035744ea in _zlog_assert_failed (xref=0x7f9f0362c6c0 <_xref.15>, extra=0x0) at lib/zlog.c:789 > FRRouting#6 0x00007f9f034d25ee in listnode_head (list=0x0) at lib/linklist.c:316 > FRRouting#7 0x000055cd65aaa481 in lib_interface_state_isis_adjacencies_adjacency_get_next (args=0x7ffe95261730) at isisd/isis_nb_state.c:101 > FRRouting#8 0x00007f9f034feadd in nb_callback_get_next (nb_node=0x55cd673c0190, parent_list_entry=0x55cd67570d30, list_entry=0x55cd6758f8a0) at lib/northbound.c:1748 > FRRouting#9 0x00007f9f0350bf07 in __walk (ys=0x55cd675782b0, is_resume=false) at lib/northbound_oper.c:1264 > FRRouting#10 0x00007f9f0350deaa in nb_op_walk_start (ys=0x55cd675782b0) at lib/northbound_oper.c:1741 > FRRouting#11 0x00007f9f0350e079 in nb_oper_iterate_legacy (xpath=0x55cd67595c60 "/frr-interface:lib", translator=0x0, flags=0, cb=0x0, cb_arg=0x0, tree=0x7ffe952621b0) at lib/northbound_oper.c:1803 > FRRouting#12 0x00007f9f03507661 in show_yang_operational_data_magic (self=0x7f9f03634a80 <show_yang_operational_data_cmd>, vty=0x55cd675a61f0, argc=4, argv=0x55cd6758eab0, > xpath=0x55cd67595c60 "/frr-interface:lib", json=0x0, xml=0x0, translator_family=0x0, with_config=0x0) at lib/northbound_cli.c:1576 > FRRouting#13 0x00007f9f035037f0 in show_yang_operational_data (self=0x7f9f03634a80 <show_yang_operational_data_cmd>, vty=0x55cd675a61f0, argc=4, argv=0x55cd6758eab0) > at ./lib/northbound_cli_clippy.c:906 > FRRouting#14 0x00007f9f0349435d in cmd_execute_command_real (vline=0x55cd6758e490, vty=0x55cd675a61f0, cmd=0x0, up_level=0) at lib/command.c:1003 > FRRouting#15 0x00007f9f03494477 in cmd_execute_command (vline=0x55cd67585340, vty=0x55cd675a61f0, cmd=0x0, vtysh=0) at lib/command.c:1053 > FRRouting#16 0x00007f9f03494a0c in cmd_execute (vty=0x55cd675a61f0, cmd=0x55cd67579040 "do show yang operational-data /frr-interface:lib", matched=0x0, vtysh=0) at lib/command.c:1228 > FRRouting#17 0x00007f9f0355239d in vty_command (vty=0x55cd675a61f0, buf=0x55cd67579040 "do show yang operational-data /frr-interface:lib") at lib/vty.c:625 > FRRouting#18 0x00007f9f03554136 in vty_execute (vty=0x55cd675a61f0) at lib/vty.c:1388 > FRRouting#19 0x00007f9f0355634c in vtysh_read (thread=0x7ffe952647a0) at lib/vty.c:2400 > FRRouting#20 0x00007f9f0354b6f6 in event_call (thread=0x7ffe952647a0) at lib/event.c:1996 > FRRouting#21 0x00007f9f034d1365 in frr_run (master=0x55cd67204da0) at lib/libfrr.c:1231 > FRRouting#22 0x000055cd65a3236e in main (argc=7, argv=0x7ffe952649c8, envp=0x7ffe95264a08) at isisd/isis_main.c:354 Fixes: 2a1c520 ("isisd: split northbound callbacks into multiple files") Signed-off-by: Louis Scalbert <[email protected]>

A crash was observed in 8.4.4 zebra when FRR processes were being shutting down by automated script: Thread 1 (LWP 16051): #0 0x00007f63f449bb8f in raise () from /lib64/libpthread.so.0 FRRouting#1 0x00007f63f5c68300 in core_handler (signo=11, siginfo=0x7ffec6322cb0, context=<optimized out>) at lib/sigevent.c:261 FRRouting#2 <signal handler called> FRRouting#3 zebra_router_get_table (zvrf=zvrf@entry=0x0, tableid=tableid@entry=254, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST) at /usr/include/bits/string_fortified.h:71 FRRouting#4 0x0000560d74192e6d in zebra_vrf_get_table_with_table_id (afi=AFI_IP, safi=SAFI_UNICAST, vrf_id=<optimized out>, table_id=254) at zebra/zebra_vrf.c:335 FRRouting#5 0x0000560d74186d20 in process_subq_early_route_add (ere=<optimized out>) at zebra/zebra_rib.c:2649 FRRouting#6 process_subq_early_route (lnode=0x560d7783c5f0) at zebra/zebra_rib.c:3127 FRRouting#7 process_subq (qindex=META_QUEUE_EARLY_ROUTE, subq=0x560d75cb1e40) at zebra/zebra_rib.c:3150 FRRouting#8 meta_queue_process (dummy=<optimized out>, data=0x560d75cba680) at zebra/zebra_rib.c:3202 FRRouting#9 0x00007f63f5c84550 in work_queue_run (thread=0x7ffec63233d0) at lib/workqueue.c:285 FRRouting#10 0x00007f63f5c7a4c1 in thread_call (thread=thread@entry=0x7ffec63233d0) at lib/thread.c:2008 FRRouting#11 0x00007f63f5c32088 in frr_run (master=0x560d75acbf50) at lib/libfrr.c:1216 FRRouting#12 0x0000560d7411c8f7 in main (argc=<optimized out>, argv=0x7ffec63237a8) at zebra/main.c:499 Below is analysis for the sequence of events which led to zebra crash: - configs including VRF configs were deleted in zebra - Some route messages for the deleted VRF were still in the zebra metaq waiting to be processed - when the route message was dequeued for processing, the VRF was already deleted - lookup of zvrf failed for the route vrf_id in route-entry, but the NULL return was not checked, resulted in SIGSEGV crash when it was dereferenced later Signed-off-by: Jenny Yuan <[email protected]>

``` ==5445==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x7ff4c6bedb19 bp 0x7ffc95f2e400 sp 0x7ffc95f2e3c0 T0) ==5445==The signal is caused by a READ memory access. ==5445==Hint: address points to the zero page. #0 0x7ff4c6bedb19 in hash_iterate lib/hash.c:246 #1 0x5618f41f5f59 in bgp_evpn_nh_finish bgpd/bgp_evpn_mh.c:4663 #2 0x5618f41dcbe8 in bgp_evpn_vrf_delete bgpd/bgp_evpn.c:7336 #3 0x5618f43bdd35 in bgp_delete bgpd/bgpd.c:4098 #4 0x5618f417ef6e in bgp_exit bgpd/bgp_main.c:206 #5 0x5618f417ef6e in sigint bgpd/bgp_main.c:164 #6 0x7ff4c6cac6c4 in frr_sigevent_process lib/sigevent.c:117 #7 0x7ff4c6cd8258 in event_fetch lib/event.c:1767 #8 0x7ff4c6c0dcbc in frr_run lib/libfrr.c:1230 #9 0x5618f418080d in main bgpd/bgp_main.c:555 #10 0x7ff4c670c249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x7ff4c670c304 in __libc_start_main_impl ../csu/libc-start.c:360 #12 0x5618f417ea20 in _start (/usr/lib/frr/bgpd+0x2e4a20) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV lib/hash.c:246 in hash_iterate ``` Signed-off-by: Donatas Abraitis <[email protected]>

``` ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000aecf0 at pc 0x5555557ecdb9 bp 0x7fffffffe350 sp 0x7fffffffe340 READ of size 4 at 0x6160000aecf0 thread T0 #0 0x5555557ecdb8 in igmp_source_delete pimd/pim_igmpv3.c:340 FRRouting#1 0x5555557ed475 in igmp_source_delete_expired pimd/pim_igmpv3.c:405 FRRouting#2 0x5555557de574 in igmp_group_timer pimd/pim_igmp.c:1346 FRRouting#3 0x7ffff7275421 in event_call lib/event.c:1996 FRRouting#4 0x7ffff7140797 in frr_run lib/libfrr.c:1237 FRRouting#5 0x5555557f5840 in main pimd/pim_main.c:166 FRRouting#6 0x7ffff6a54082 in __libc_start_main ../csu/libc-start.c:308 FRRouting#7 0x555555686eed in _start (/usr/lib/frr/pimd+0x132eed) 0x6160000aecf0 is located 112 bytes inside of 600-byte region [0x6160000aec80,0x6160000aeed8) freed by thread T0 here: #0 0x7ffff767b40f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122 FRRouting#1 0x7ffff716ed34 in qfree lib/memory.c:131 FRRouting#2 0x5555557169ae in pim_channel_oil_free pimd/pim_oil.c:84 FRRouting#3 0x555555717981 in pim_channel_oil_del pimd/pim_oil.c:199 FRRouting#4 0x55555573c42c in tib_sg_gm_prune pimd/pim_tib.c:196 FRRouting#5 0x5555557d6d04 in igmp_source_forward_stop pimd/pim_igmp.c:229 FRRouting#6 0x5555557d5855 in igmp_anysource_forward_stop pimd/pim_igmp.c:61 FRRouting#7 0x5555557de539 in igmp_group_timer pimd/pim_igmp.c:1344 FRRouting#8 0x7ffff7275421 in event_call lib/event.c:1996 FRRouting#9 0x7ffff7140797 in frr_run lib/libfrr.c:1237 FRRouting#10 0x5555557f5840 in main pimd/pim_main.c:166 FRRouting#11 0x7ffff6a54082 in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7ffff767ba06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153 FRRouting#1 0x7ffff716ebe1 in qcalloc lib/memory.c:106 FRRouting#2 0x555555716eb7 in pim_channel_oil_add pimd/pim_oil.c:133 FRRouting#3 0x55555573b2b9 in tib_sg_oil_setup pimd/pim_tib.c:30 FRRouting#4 0x55555573bdd3 in tib_sg_gm_join pimd/pim_tib.c:119 FRRouting#5 0x5555557d6788 in igmp_source_forward_start pimd/pim_igmp.c:193 FRRouting#6 0x5555557d5771 in igmp_anysource_forward_start pimd/pim_igmp.c:51 FRRouting#7 0x5555557ecaa0 in group_exclude_fwd_anysrc_ifempty pimd/pim_igmpv3.c:310 FRRouting#8 0x5555557ef937 in toex_incl pimd/pim_igmpv3.c:839 FRRouting#9 0x5555557f00a2 in igmpv3_report_toex pimd/pim_igmpv3.c:938 FRRouting#10 0x5555557f543d in igmp_v3_recv_report pimd/pim_igmpv3.c:2000 FRRouting#11 0x5555557da2b4 in pim_igmp_packet pimd/pim_igmp.c:787 FRRouting#12 0x5555556ee46a in process_igmp_packet pimd/pim_mroute.c:763 FRRouting#13 0x5555556ee5f3 in pim_mroute_msg pimd/pim_mroute.c:787 FRRouting#14 0x5555556eef58 in mroute_read pimd/pim_mroute.c:877 FRRouting#15 0x7ffff7275421 in event_call lib/event.c:1996 FRRouting#16 0x7ffff7140797 in frr_run lib/libfrr.c:1237 FRRouting#17 0x5555557f5840 in main pimd/pim_main.c:166 FRRouting#18 0x7ffff6a54082 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-use-after-free pimd/pim_igmpv3.c:340 in igmp_source_delete Shadow bytes around the buggy address: 0x0c2c8000dd40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000dd50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000dd60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000dd70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000dd80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0c2c8000dd90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd 0x0c2c8000dda0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000ddb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000ddc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2c8000ddd0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa 0x0c2c8000dde0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ``` Signed-off-by: Jafar Al-Gharaibeh <[email protected]>

When a BFD failure happens with multiple same nexthops belonging to different peers, the parent nexthop group is not refreshed, and after one BGP failover, a new parent NHGID is chosen. This case happens on a multihomed setup with a full-route that contains recursive routes that have the same nexthop. The below case shows the same prefix with the same nexthop learned from 3 different peers. A single nexthop group is created for all the three identical nexthops. > r1# show bgp nexthop-group 71428590 detail > ID: 71428590, #paths 6 > Flags: 0x000b (allowRecursion, internalBgp, TypeParent) > State: 0x0001 (Installed) > child list count 1 (peer count 3) > child(s) 71428577 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0x418 > r1# show bgp nexthop-group 71428577 detail > ID: 71428577, #paths 6 > Flags: 0x0003 (allowRecursion, internalBgp) > State: 0x0001 (Installed) > via 172.16.0.100 (vrf default) inactive > parent list count 1 > parent(s) 71428590 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > r1# show bgp ipv4 172.18.1.101/32 > BGP routing table entry for 172.18.1.101/32, version 26 > Paths: (3 available, best FRRouting#1, table default, vrf (null)) > Not advertised to any peer > Local > 172.16.0.100 from 192.0.2.5 (192.0.2.5) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath, best (Router ID) > AddPath ID: RX 5, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:47:14 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.8) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.8, Cluster list: 192.0.2.3 > AddPath ID: RX 18, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.6) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.6, Cluster list: 192.0.2.3 > AddPath ID: RX 13, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 After failover, the parent nexthop group is replaced, and all the remaining BGP paths are updated with the new parent nexthop group. Instead, the same parent NHGID should be kept. Create an hash list of peers for each child nexthop. Before failover, 3 peers are mentioned along with the exact path counter. At failover, the failed peer decrements, and the parent NHGID is refreshed > 2024/09/13 13:52:11.277173 BGP: [MNZ4S-8HW2T] NHG 71428578: peer count changed (3 -> 2) for nexthop (172.16.0.100 if 0 VRF 0 wt 0 ) Signed-off-by: Philippe Guibert <[email protected]>

The following ASAN issue has been observed: > ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840 > READ of size 4 at 0x6160000acba4 thread T0 > #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315 > FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331 > FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680 > FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490 > FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717 > FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413 > FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919 > FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454 > FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822 > FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212 > FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968 > FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75 > FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111 > FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46 > FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191 > FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244 > FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249 > FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011 > FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212 > FRRouting#19 0x55910c4f49cb in main zebra/main.c:531 > FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392 > FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114) It happens with FRR using the kernel. During shutdown, the namespace identifier is attempted to be obtained by zebra, in an attempt to prepare zebra dataplane nexthop messages. Fix this by accessing the ns structure. Signed-off-by: Philippe Guibert <[email protected]>

When a BFD failure happens with multiple same nexthops belonging to different peers, the parent nexthop group is not refreshed, and after one BGP failover, a new parent NHGID is chosen. This case happens on a multihomed setup with a full-route that contains recursive routes that have the same nexthop. The below case shows the same prefix with the same nexthop learned from 3 different peers. A single nexthop group is created for all the three identical nexthops. > r1# show bgp nexthop-group 71428590 detail > ID: 71428590, #paths 6 > Flags: 0x000b (allowRecursion, internalBgp, TypeParent) > State: 0x0001 (Installed) > child list count 1 (peer count 3) > child(s) 71428577 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0x418 > r1# show bgp nexthop-group 71428577 detail > ID: 71428577, #paths 6 > Flags: 0x0003 (allowRecursion, internalBgp) > State: 0x0001 (Installed) > via 172.16.0.100 (vrf default) inactive > parent list count 1 > parent(s) 71428590 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > r1# show bgp ipv4 172.18.1.101/32 > BGP routing table entry for 172.18.1.101/32, version 26 > Paths: (3 available, best FRRouting#1, table default, vrf (null)) > Not advertised to any peer > Local > 172.16.0.100 from 192.0.2.5 (192.0.2.5) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath, best (Router ID) > AddPath ID: RX 5, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:47:14 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.8) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.8, Cluster list: 192.0.2.3 > AddPath ID: RX 18, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.6) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.6, Cluster list: 192.0.2.3 > AddPath ID: RX 13, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 After failover, the parent nexthop group is replaced, and all the remaining BGP paths are updated with the new parent nexthop group. Instead, the same parent NHGID should be kept. Create an hash list of peers for each child nexthop. Before failover, 3 peers are mentioned along with the exact path counter. At failover, the failed peer decrements, and the parent NHGID is refreshed > 2024/09/13 13:52:11.277173 BGP: [MNZ4S-8HW2T] NHG 71428578: peer count changed (3 -> 2) for nexthop (172.16.0.100 if 0 VRF 0 wt 0 ) Signed-off-by: Philippe Guibert <[email protected]>

The following ASAN issue has been observed: > ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840 > READ of size 4 at 0x6160000acba4 thread T0 > #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315 > FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331 > FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680 > FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490 > FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717 > FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413 > FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919 > FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454 > FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822 > FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212 > FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968 > FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75 > FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111 > FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46 > FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191 > FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244 > FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249 > FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011 > FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212 > FRRouting#19 0x55910c4f49cb in main zebra/main.c:531 > FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392 > FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114) It happens with FRR using the kernel. During shutdown, the namespace identifier is attempted to be obtained by zebra, in an attempt to prepare zebra dataplane nexthop messages. Fix this by accessing the ns structure. Signed-off-by: Philippe Guibert <[email protected]>

When a BFD failure happens with multiple same nexthops belonging to different peers, the parent nexthop group is not refreshed, and after one BGP failover, a new parent NHGID is chosen. This case happens on a multihomed setup with a full-route that contains recursive routes that have the same nexthop. The below case shows the same prefix with the same nexthop learned from 3 different peers. A single nexthop group is created for all the three identical nexthops. > r1# show bgp nexthop-group 71428590 detail > ID: 71428590, #paths 6 > Flags: 0x000b (allowRecursion, internalBgp, TypeParent) > State: 0x0001 (Installed) > child list count 1 (peer count 3) > child(s) 71428577 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0x418 > r1# show bgp nexthop-group 71428577 detail > ID: 71428577, #paths 6 > Flags: 0x0003 (allowRecursion, internalBgp) > State: 0x0001 (Installed) > via 172.16.0.100 (vrf default) inactive > parent list count 1 > parent(s) 71428590 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > r1# show bgp ipv4 172.18.1.101/32 > BGP routing table entry for 172.18.1.101/32, version 26 > Paths: (3 available, best FRRouting#1, table default, vrf (null)) > Not advertised to any peer > Local > 172.16.0.100 from 192.0.2.5 (192.0.2.5) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath, best (Router ID) > AddPath ID: RX 5, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:47:14 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.8) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.8, Cluster list: 192.0.2.3 > AddPath ID: RX 18, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.6) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.6, Cluster list: 192.0.2.3 > AddPath ID: RX 13, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 After failover, the parent nexthop group is replaced, and all the remaining BGP paths are updated with the new parent nexthop group. Instead, the same parent NHGID should be kept. Create an hash list of peers for each child nexthop. Before failover, 3 peers are mentioned along with the exact path counter. At failover, the failed peer decrements, and the parent NHGID is refreshed > 2024/09/13 13:52:11.277173 BGP: [MNZ4S-8HW2T] NHG 71428578: peer count changed (3 -> 2) for nexthop (172.16.0.100 if 0 VRF 0 wt 0 ) Signed-off-by: Philippe Guibert <[email protected]>

The following ASAN issue has been observed: > ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840 > READ of size 4 at 0x6160000acba4 thread T0 > #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315 > FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331 > FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680 > FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490 > FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717 > FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413 > FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919 > FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454 > FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822 > FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212 > FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968 > FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75 > FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111 > FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46 > FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191 > FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244 > FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249 > FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011 > FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212 > FRRouting#19 0x55910c4f49cb in main zebra/main.c:531 > FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392 > FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114) It happens with FRR using the kernel. During shutdown, the namespace identifier is attempted to be obtained by zebra, in an attempt to prepare zebra dataplane nexthop messages. Fix this by accessing the ns structure. Signed-off-by: Philippe Guibert <[email protected]>

When a BFD failure happens with multiple same nexthops belonging to different peers, the parent nexthop group is not refreshed, and after one BGP failover, a new parent NHGID is chosen. This case happens on a multihomed setup with a full-route that contains recursive routes that have the same nexthop. The below case shows the same prefix with the same nexthop learned from 3 different peers. A single nexthop group is created for all the three identical nexthops. > r1# show bgp nexthop-group 71428590 detail > ID: 71428590, #paths 6 > Flags: 0x000b (allowRecursion, internalBgp, TypeParent) > State: 0x0001 (Installed) > child list count 1 (peer count 3) > child(s) 71428577 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0x418 > r1# show bgp nexthop-group 71428577 detail > ID: 71428577, #paths 6 > Flags: 0x0003 (allowRecursion, internalBgp) > State: 0x0001 (Installed) > via 172.16.0.100 (vrf default) inactive > parent list count 1 > parent(s) 71428590 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > r1# show bgp ipv4 172.18.1.101/32 > BGP routing table entry for 172.18.1.101/32, version 26 > Paths: (3 available, best FRRouting#1, table default, vrf (null)) > Not advertised to any peer > Local > 172.16.0.100 from 192.0.2.5 (192.0.2.5) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath, best (Router ID) > AddPath ID: RX 5, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:47:14 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.8) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.8, Cluster list: 192.0.2.3 > AddPath ID: RX 18, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.6) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.6, Cluster list: 192.0.2.3 > AddPath ID: RX 13, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 After failover, the parent nexthop group is replaced, and all the remaining BGP paths are updated with the new parent nexthop group. Instead, the same parent NHGID should be kept. Create an hash list of peers for each child nexthop. Before failover, 3 peers are mentioned along with the exact path counter. At failover, the failed peer decrements, and the parent NHGID is refreshed > 2024/09/13 13:52:11.277173 BGP: [MNZ4S-8HW2T] NHG 71428578: peer count changed (3 -> 2) for nexthop (172.16.0.100 if 0 VRF 0 wt 0 ) Signed-off-by: Philippe Guibert <[email protected]>

The following ASAN issue has been observed: > ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840 > READ of size 4 at 0x6160000acba4 thread T0 > #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315 > FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331 > FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680 > FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490 > FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717 > FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413 > FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919 > FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454 > FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822 > FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212 > FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968 > FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75 > FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111 > FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46 > FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191 > FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244 > FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249 > FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011 > FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212 > FRRouting#19 0x55910c4f49cb in main zebra/main.c:531 > FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392 > FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114) It happens with FRR using the kernel. During shutdown, the namespace identifier is attempted to be obtained by zebra, in an attempt to prepare zebra dataplane nexthop messages. Fix this by accessing the ns structure. Signed-off-by: Philippe Guibert <[email protected]>

When a BFD failure happens with multiple same nexthops belonging to different peers, the parent nexthop group is not refreshed, and after one BGP failover, a new parent NHGID is chosen. This case happens on a multihomed setup with a full-route that contains recursive routes that have the same nexthop. The below case shows the same prefix with the same nexthop learned from 3 different peers. A single nexthop group is created for all the three identical nexthops. > r1# show bgp nexthop-group 71428590 detail > ID: 71428590, #paths 6 > Flags: 0x000b (allowRecursion, internalBgp, TypeParent) > State: 0x0001 (Installed) > child list count 1 (peer count 3) > child(s) 71428577 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0x418 > r1# show bgp nexthop-group 71428577 detail > ID: 71428577, #paths 6 > Flags: 0x0003 (allowRecursion, internalBgp) > State: 0x0001 (Installed) > via 172.16.0.100 (vrf default) inactive > parent list count 1 > parent(s) 71428590 > Paths: > 1/1 172.18.1.101/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0x418 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.101/32 VRF default flags 0xc10 > 1/1 172.18.1.100/32 VRF default flags 0xc10 > r1# show bgp ipv4 172.18.1.101/32 > BGP routing table entry for 172.18.1.101/32, version 26 > Paths: (3 available, best FRRouting#1, table default, vrf (null)) > Not advertised to any peer > Local > 172.16.0.100 from 192.0.2.5 (192.0.2.5) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath, best (Router ID) > AddPath ID: RX 5, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:47:14 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.8) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.8, Cluster list: 192.0.2.3 > AddPath ID: RX 18, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 > Local > 172.16.0.100 from 192.0.2.3 (192.0.2.6) > Origin incomplete, metric 0, localpref 100, valid, internal, multipath > Originator: 192.0.2.6, Cluster list: 192.0.2.3 > AddPath ID: RX 13, TX-All 0 TX-Best-Per-AS 0 TX-Best-Selected 0 > Last update: Fri Sep 13 14:45:12 2024 After failover, the parent nexthop group is replaced, and all the remaining BGP paths are updated with the new parent nexthop group. Instead, the same parent NHGID should be kept. Create an hash list of peers for each child nexthop. Before failover, 3 peers are mentioned along with the exact path counter. At failover, the failed peer decrements, and the parent NHGID is refreshed > 2024/09/13 13:52:11.277173 BGP: [MNZ4S-8HW2T] NHG 71428578: peer count changed (3 -> 2) for nexthop (172.16.0.100 if 0 VRF 0 wt 0 ) Signed-off-by: Philippe Guibert <[email protected]>

donaldsharp added 3 commits December 15, 2016 21:00

lib: Ensure ptrs are NULL on free

8eefe20

There exists a possibility that when we cleanup for shutdown that we may attempt to access them again. Found via valgrind, stopped showing up in there. Signed-off-by: Donald Sharp <[email protected]>

bgpd: Cleanup double read of free'd data

26acb92

Valgrind found this issue. This cleans it up from happening. Signed-off-by: Donald Sharp <[email protected]>

bgpd: Fix 'show ip bgp summary' variable output being wrong

c9d5bd2

The first time through calling 'show ip bgp summary' we were always calculating the variable hostname field size incorrectly. Signed-off-by: Donald Sharp <[email protected]>

donaldsharp requested a review from eqvinox December 16, 2016 02:32

qlyoung approved these changes Dec 16, 2016

View reviewed changes

eqvinox approved these changes Dec 16, 2016

View reviewed changes

eqvinox merged commit d5444d2 into FRRouting:stable/2.0 Dec 16, 2016

donaldsharp mentioned this pull request Dec 16, 2016

isis crash #4

Closed

donaldsharp pushed a commit that referenced this pull request Feb 14, 2017

zebra: Fix CID 1399335 (#1 of 1): Wrong sizeof argument (SIZEOF_MISMA…

4fdeb6b

…TCH) Needs to be size of correct structure (prefix instead of prefix_ipv4) Signed-off-by: Martin Winter <[email protected]>

donaldsharp deleted the stable_patches branch May 17, 2017 12:01

anithanarasimhamurthy mentioned this pull request May 17, 2017

nhrpd crash on unconfig of nhrp event socket #570

Closed

rwestphal mentioned this pull request Jun 7, 2017

Add Pseudowire management in Zebra #601

Closed

pguibert6WIND mentioned this pull request Jul 17, 2017

frr/master :crash vnc at startup #825

Closed

donaldsharp mentioned this pull request Sep 5, 2017

large-community regex crash #1103

Closed

louberger mentioned this pull request Sep 7, 2017

bgpd: route map changes not propagated to existing static network (ipv4 unicast tested) #1117

Closed

donaldsharp mentioned this pull request Sep 8, 2017

ospfv3 is failing in topotests #1135

Closed

dwalton76 mentioned this pull request Sep 27, 2017

bgpd: fix 4-byte AS display in bestpath-from-AS #1261

Merged

donaldsharp mentioned this pull request Oct 20, 2017

Allow VRFs for FPM #1343

Closed

louberger mentioned this pull request Dec 13, 2017

Sa from clang #1547

Merged

donaldsharp mentioned this pull request Jan 11, 2018

zebra route-leaking for static routes #1618

Merged

nevzorofff mentioned this pull request Jan 12, 2018

Downs't watch on MTU change #1601

Closed

louberger mentioned this pull request Jan 24, 2018

FRR pthread improvements #1672

Merged

skydevil56 mentioned this pull request Mar 31, 2018

NS initialisation failure (Permission denied) #2007

Closed

louberger mentioned this pull request Apr 3, 2018

lib: remove IRDP_NODE #2006

Merged

louis-6wind mentioned this pull request Sep 12, 2024

tests: fix isis_lsp_bits_topo1 race condition #16807

Merged

This was referenced Sep 13, 2024

bgpd: fix as-path exclude modify crash (backport #16779) #16813

Closed

bgpd: fix as-path exclude modify crash (backport #16779) #16814

Closed

louis-6wind mentioned this pull request Sep 14, 2024

bgpd: fix missing addpath withdrawal race condition #16830

Open

jyuan-panw mentioned this pull request Sep 18, 2024

zebra: Skip route table lookup if zvrf is NULL #16858

Closed

ramaraju1007 mentioned this pull request Oct 1, 2024

bgp crash few minutes after enabling peer ipv4+6 route redist (max ipv4+max ipv6) #16963

Open

2 tasks

pguibert6WIND mentioned this pull request Oct 7, 2024

zebra: fix heap-use-after free on ns shutdown #17020

Open

sridharsanthanam mentioned this pull request Oct 9, 2024

Zebra crash in route_node_delete() as the same route node is accessed in two different threads. #17047

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A small collection of patches that fix issues found by valgrind #1

A small collection of patches that fix issues found by valgrind #1

donaldsharp commented Dec 16, 2016

A small collection of patches that fix issues found by valgrind #1

A small collection of patches that fix issues found by valgrind #1

Conversation

donaldsharp commented Dec 16, 2016