Skip to content

Commit

Permalink
Allow setting the BGP graceful restart deferral time. See RFC4724 4.1
Browse files Browse the repository at this point in the history
GoBGP's default value for deferral time is 360 seconds.
That means that the routes are not sent to the BGP peer until
this timer is elapsed, so a server is unreachable for 360
seconds, when kube-router restarts.

The new parameter is --bgp-graceful-restart-deferral-time duration_with_unit

For example '--bgp-graceful-restart-deferral-time 10s'
  • Loading branch information
Marcus Röder committed Aug 21, 2019
1 parent b54b80c commit 64fa7aa
Show file tree
Hide file tree
Showing 5 changed files with 166 additions and 147 deletions.
91 changes: 46 additions & 45 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,51 +36,52 @@ Also you can choose to run kube-router as agent running on each cluster node. Al

```
Usage of kube-router:
--advertise-cluster-ip Add Cluster IP of the service to the RIB so that it gets advertises to the BGP peers.
--advertise-external-ip Add External IP of service to the RIB so that it gets advertised to the BGP peers.
--advertise-loadbalancer-ip Add LoadbBalancer IP of service status as set by the LB provider to the RIB so that it gets advertised to the BGP peers.
--advertise-pod-cidr Add Node's POD cidr to the RIB so that it gets advertised to the BGP peers. (default true)
--bgp-graceful-restart Enables the BGP Graceful Restart capability so that routes are preserved on unexpected restarts
--bgp-port uint16 The port open for incoming BGP connections and to use for connecting with other BGP peers. (default 179)
--cache-sync-timeout duration The timeout for cache synchronization (e.g. '5s', '1m'). Must be greater than 0. (default 1m0s)
--cleanup-config Cleanup iptables rules, ipvs, ipset configuration and exit.
--cluster-asn uint ASN number under which cluster nodes will run iBGP.
--cluster-cidr string CIDR range of pods in the cluster. It is used to identify traffic originating from and destinated to pods.
--disable-source-dest-check Disable the source-dest-check attribute for AWS EC2 instances. When this option is false, it must be set some other way. (default true)
--enable-cni Enable CNI plugin. Disable if you want to use kube-router features alongside another CNI plugin. (default true)
--enable-ibgp Enables peering with nodes with the same ASN, if disabled will only peer with external BGP peers (default true)
--enable-overlay When enable-overlay is set to true, IP-in-IP tunneling is used for pod-to-pod networking across nodes in different subnets. When set to false no tunneling is used and routing infrastructure is expected to route traffic for pod-to-pod networking across nodes in different subnets (default true)
--enable-pod-egress SNAT traffic from Pods to destinations outside the cluster. (default true)
--enable-pprof Enables pprof for debugging performance and memory leak issues.
--hairpin-mode Add iptables rules for every Service Endpoint to support hairpin traffic.
--health-port uint16 Health check port, 0 = Disabled (default 20244)
-h, --help Print usage information.
--hostname-override string Overrides the NodeName of the node. Set this if kube-router is unable to determine your NodeName automatically.
--iptables-sync-period duration The delay between iptables rule synchronizations (e.g. '5s', '1m'). Must be greater than 0. (default 5m0s)
--ipvs-graceful-period duration The graceful period before removing destinations from IPVS services (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 30s)
--ipvs-graceful-termination Enables the experimental IPVS graceful terminaton capability
--ipvs-sync-period duration The delay between ipvs config synchronizations (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 5m0s)
--kubeconfig string Path to kubeconfig file with authorization information (the master location is set by the master flag).
--masquerade-all SNAT all traffic to cluster IP/node port.
--master string The address of the Kubernetes API server (overrides any value in kubeconfig).
--metrics-path string Prometheus metrics path (default "/metrics")
--metrics-port uint16 Prometheus metrics port, (Default 0, Disabled)
--nodeport-bindon-all-ip For service of NodePort type create IPVS service that listens on all IP's of the node.
--nodes-full-mesh Each node in the cluster will setup BGP peering with rest of the nodes. (default true)
--overlay-type string Possible values: subnet,full - When set to "subnet", the default, default "--enable-overlay=true" behavior is used. When set to "full", it changes "--enable-overlay=true" default behavior so that IP-in-IP tunneling is used for pod-to-pod networking across nodes regardless of the subnet the nodes are in. (default "subnet")
--override-nexthop Override the next-hop in bgp routes sent to peers with the local ip.
--peer-router-asns uints ASN numbers of the BGP peer to which cluster nodes will advertise cluster ip and node's pod cidr. (default [])
--peer-router-ips ipSlice The ip address of the external router to which all nodes will peer and advertise the cluster ip and pod cidr's. (default [])
--peer-router-multihop-ttl uint8 Enable eBGP multihop supports -- sets multihop-ttl. (Relevant only if ttl >= 2)
--peer-router-passwords strings Password for authenticating against the BGP peer defined with "--peer-router-ips".
--peer-router-ports uints The remote port of the external BGP to which all nodes will peer. If not set, default BGP port (179) will be used. (default [])
--router-id string BGP router-id. Must be specified in a ipv6 only cluster.
--routes-sync-period duration The delay between route updates and advertisements (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 5m0s)
--run-firewall Enables Network Policy -- sets up iptables to provide ingress firewall for pods. (default true)
--run-router Enables Pod Networking -- Advertises and learns the routes to Pods via iBGP. (default true)
--run-service-proxy Enables Service Proxy -- sets up IPVS for Kubernetes Services. (default true)
-v, --v string log level for V logs (default "0")
-V, --version Print version information.
--advertise-cluster-ip Add Cluster IP of the service to the RIB so that it gets advertises to the BGP peers.
--advertise-external-ip Add External IP of service to the RIB so that it gets advertised to the BGP peers.
--advertise-loadbalancer-ip Add LoadbBalancer IP of service status as set by the LB provider to the RIB so that it gets advertised to the BGP peers.
--advertise-pod-cidr Add Node's POD cidr to the RIB so that it gets advertised to the BGP peers. (default true)
--bgp-graceful-restart Enables the BGP Graceful Restart capability so that routes are preserved on unexpected restarts
--bgp-graceful-restart-deferral-time duration BGP Graceful restart deferral time according to RFC4724 4.1, maximum 18h. (default 6m0s)
--bgp-port uint16 The port open for incoming BGP connections and to use for connecting with other BGP peers. (default 179)
--cache-sync-timeout duration The timeout for cache synchronization (e.g. '5s', '1m'). Must be greater than 0. (default 1m0s)
--cleanup-config Cleanup iptables rules, ipvs, ipset configuration and exit.
--cluster-asn uint ASN number under which cluster nodes will run iBGP.
--cluster-cidr string CIDR range of pods in the cluster. It is used to identify traffic originating from and destinated to pods.
--disable-source-dest-check Disable the source-dest-check attribute for AWS EC2 instances. When this option is false, it must be set some other way. (default true)
--enable-cni Enable CNI plugin. Disable if you want to use kube-router features alongside another CNI plugin. (default true)
--enable-ibgp Enables peering with nodes with the same ASN, if disabled will only peer with external BGP peers (default true)
--enable-overlay When enable-overlay is set to true, IP-in-IP tunneling is used for pod-to-pod networking across nodes in different subnets. When set to false no tunneling is used and routing infrastructure is expected to route traffic for pod-to-pod networking across nodes in different subnets (default true)
--enable-pod-egress SNAT traffic from Pods to destinations outside the cluster. (default true)
--enable-pprof Enables pprof for debugging performance and memory leak issues.
--hairpin-mode Add iptables rules for every Service Endpoint to support hairpin traffic.
--health-port uint16 Health check port, 0 = Disabled (default 20244)
-h, --help Print usage information.
--hostname-override string Overrides the NodeName of the node. Set this if kube-router is unable to determine your NodeName automatically.
--iptables-sync-period duration The delay between iptables rule synchronizations (e.g. '5s', '1m'). Must be greater than 0. (default 5m0s)
--ipvs-graceful-period duration The graceful period before removing destinations from IPVS services (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 30s)
--ipvs-graceful-termination Enables the experimental IPVS graceful terminaton capability
--ipvs-sync-period duration The delay between ipvs config synchronizations (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 5m0s)
--kubeconfig string Path to kubeconfig file with authorization information (the master location is set by the master flag).
--masquerade-all SNAT all traffic to cluster IP/node port.
--master string The address of the Kubernetes API server (overrides any value in kubeconfig).
--metrics-path string Prometheus metrics path (default "/metrics")
--metrics-port uint16 Prometheus metrics port, (Default 0, Disabled)
--nodeport-bindon-all-ip For service of NodePort type create IPVS service that listens on all IP's of the node.
--nodes-full-mesh Each node in the cluster will setup BGP peering with rest of the nodes. (default true)
--overlay-type string Possible values: subnet,full - When set to "subnet", the default, default "--enable-overlay=true" behavior is used. When set to "full", it changes "--enable-overlay=true" default behavior so that IP-in-IP tunneling is used for pod-to-pod networking across nodes regardless of the subnet the nodes are in. (default "subnet")
--override-nexthop Override the next-hop in bgp routes sent to peers with the local ip.
--peer-router-asns uints ASN numbers of the BGP peer to which cluster nodes will advertise cluster ip and node's pod cidr. (default [])
--peer-router-ips ipSlice The ip address of the external router to which all nodes will peer and advertise the cluster ip and pod cidr's. (default [])
--peer-router-multihop-ttl uint8 Enable eBGP multihop supports -- sets multihop-ttl. (Relevant only if ttl >= 2)
--peer-router-passwords strings Password for authenticating against the BGP peer defined with "--peer-router-ips".
--peer-router-ports uints The remote port of the external BGP to which all nodes will peer. If not set, default BGP port (179) will be used. (default [])
--router-id string BGP router-id. Must be specified in a ipv6 only cluster.
--routes-sync-period duration The delay between route updates and advertisements (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 5m0s)
--run-firewall Enables Network Policy -- sets up iptables to provide ingress firewall for pods. (default true)
--run-router Enables Pod Networking -- Advertises and learns the routes to Pods via iBGP. (default true)
--run-service-proxy Enables Service Proxy -- sets up IPVS for Kubernetes Services. (default true)
-v, --v string log level for V logs (default "0")
-V, --version Print version information.
```

## requirements
Expand Down
9 changes: 9 additions & 0 deletions pkg/cmd/kube-router.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,15 @@ func (kr *KubeRouter) Run() error {
go npc.Run(healthChan, stopCh, &wg)
}

if kr.Config.BGPGracefulRestart {
if kr.Config.BGPGracefulRestartDeferralTime > time.Hour*18 {
return errors.New("BGPGracefuleRestartDeferralTime should be less than 18 hours")
}
if kr.Config.BGPGracefulRestartDeferralTime <= 0 {
return errors.New("BGPGracefuleRestartDeferralTime must be positive")
}
}

if kr.Config.RunRouter {
nrc, err := routing.NewNetworkRoutingController(kr.Client, kr.Config, nodeInformer, svcInformer, epInformer)
if err != nil {
Expand Down
9 changes: 6 additions & 3 deletions pkg/controllers/routing/bgp_peers.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,12 @@ func (nrc *NetworkRoutingController) syncInternalPeers() {
if nrc.bgpGracefulRestart {
n.GracefulRestart = config.GracefulRestart{
Config: config.GracefulRestartConfig{
Enabled: true,
Enabled: true,
DeferralTime: uint16(nrc.bgpGracefulRestartDeferralTime.Seconds()),
},
State: config.GracefulRestartState{
LocalRestarting: true,
DeferralTime: uint16(nrc.bgpGracefulRestartDeferralTime.Seconds()),
},
}

Expand Down Expand Up @@ -193,13 +195,14 @@ func (nrc *NetworkRoutingController) syncInternalPeers() {
}

// connectToExternalBGPPeers adds all the configured eBGP peers (global or node specific) as neighbours
func connectToExternalBGPPeers(server *gobgp.BgpServer, peerNeighbors []*config.Neighbor, bgpGracefulRestart bool, peerMultihopTtl uint8) error {
func connectToExternalBGPPeers(server *gobgp.BgpServer, peerNeighbors []*config.Neighbor, bgpGracefulRestart bool, bgpGracefulRestartDeferralTime time.Duration, peerMultihopTtl uint8) error {
for _, n := range peerNeighbors {

if bgpGracefulRestart {
n.GracefulRestart = config.GracefulRestart{
Config: config.GracefulRestartConfig{
Enabled: true,
Enabled: true,
DeferralTime: uint16(bgpGracefulRestartDeferralTime.Seconds()),
},
State: config.GracefulRestartState{
LocalRestarting: true,
Expand Down
94 changes: 48 additions & 46 deletions pkg/controllers/routing/network_routes_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,51 +63,52 @@ const (

// NetworkRoutingController is struct to hold necessary information required by controller
type NetworkRoutingController struct {
nodeIP net.IP
nodeName string
nodeSubnet net.IPNet
nodeInterface string
routerId string
isIpv6 bool
activeNodes map[string]bool
mu sync.Mutex
clientset kubernetes.Interface
bgpServer *gobgp.BgpServer
syncPeriod time.Duration
clusterCIDR string
enablePodEgress bool
hostnameOverride string
advertiseClusterIP bool
advertiseExternalIP bool
advertiseLoadBalancerIP bool
advertisePodCidr bool
defaultNodeAsnNumber uint32
nodeAsnNumber uint32
globalPeerRouters []*config.Neighbor
nodePeerRouters []string
enableCNI bool
bgpFullMeshMode bool
bgpEnableInternal bool
bgpGracefulRestart bool
ipSetHandler *utils.IPSet
enableOverlays bool
overlayType string
peerMultihopTTL uint8
MetricsEnabled bool
bgpServerStarted bool
bgpPort uint16
bgpRRClient bool
bgpRRServer bool
bgpClusterID uint32
cniConfFile string
disableSrcDstCheck bool
initSrcDstCheckDone bool
ec2IamAuthorized bool
pathPrependAS string
pathPrependCount uint8
pathPrepend bool
localAddressList []string
overrideNextHop bool
nodeIP net.IP
nodeName string
nodeSubnet net.IPNet
nodeInterface string
routerId string
isIpv6 bool
activeNodes map[string]bool
mu sync.Mutex
clientset kubernetes.Interface
bgpServer *gobgp.BgpServer
syncPeriod time.Duration
clusterCIDR string
enablePodEgress bool
hostnameOverride string
advertiseClusterIP bool
advertiseExternalIP bool
advertiseLoadBalancerIP bool
advertisePodCidr bool
defaultNodeAsnNumber uint32
nodeAsnNumber uint32
globalPeerRouters []*config.Neighbor
nodePeerRouters []string
enableCNI bool
bgpFullMeshMode bool
bgpEnableInternal bool
bgpGracefulRestart bool
bgpGracefulRestartDeferralTime time.Duration
ipSetHandler *utils.IPSet
enableOverlays bool
overlayType string
peerMultihopTTL uint8
MetricsEnabled bool
bgpServerStarted bool
bgpPort uint16
bgpRRClient bool
bgpRRServer bool
bgpClusterID uint32
cniConfFile string
disableSrcDstCheck bool
initSrcDstCheckDone bool
ec2IamAuthorized bool
pathPrependAS string
pathPrependCount uint8
pathPrepend bool
localAddressList []string
overrideNextHop bool

nodeLister cache.Indexer
svcLister cache.Indexer
Expand Down Expand Up @@ -821,7 +822,7 @@ func (nrc *NetworkRoutingController) startBgpServer() error {
}

if len(nrc.globalPeerRouters) != 0 {
err := connectToExternalBGPPeers(nrc.bgpServer, nrc.globalPeerRouters, nrc.bgpGracefulRestart, nrc.peerMultihopTTL)
err := connectToExternalBGPPeers(nrc.bgpServer, nrc.globalPeerRouters, nrc.bgpGracefulRestart, nrc.bgpGracefulRestartDeferralTime, nrc.peerMultihopTTL)
if err != nil {
nrc.bgpServer.Stop()
return fmt.Errorf("Failed to peer with Global Peer Router(s): %s",
Expand Down Expand Up @@ -858,6 +859,7 @@ func NewNetworkRoutingController(clientset kubernetes.Interface,
nrc.enableCNI = kubeRouterConfig.EnableCNI
nrc.bgpEnableInternal = kubeRouterConfig.EnableiBGP
nrc.bgpGracefulRestart = kubeRouterConfig.BGPGracefulRestart
nrc.bgpGracefulRestartDeferralTime = kubeRouterConfig.BGPGracefulRestartDeferralTime
nrc.peerMultihopTTL = kubeRouterConfig.PeerMultihopTtl
nrc.enablePodEgress = kubeRouterConfig.EnablePodEgress
nrc.syncPeriod = kubeRouterConfig.RoutesSyncPeriod
Expand Down
Loading

0 comments on commit 64fa7aa

Please sign in to comment.