Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconnect services dynamically #9767

Closed
barby1138 opened this issue Aug 30, 2023 · 10 comments · Fixed by networkservicemesh/sdk#1510
Closed

reconnect services dynamically #9767

barby1138 opened this issue Aug 30, 2023 · 10 comments · Fixed by networkservicemesh/sdk#1510
Assignees
Labels
bug Something isn't working

Comments

@barby1138
Copy link

barby1138 commented Aug 30, 2023

Question

Hi - I have nsc-kernel and two nse-kernel
I have declared networkservice to connect nsc to nse-1
using selector - works fine.
Now I want to connect nsc to nse-2
I redeployed corresponding networkservice.
But no change was propogated automaticaly.
Is there a possibility to reconnect services dynamically?

@barby1138
Copy link
Author

As I understood sidecar container inside client should handle networkservice declaration changes.
But for some reason it does not.
What do I miss?

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Aug 30, 2023
@barby1138
Copy link
Author

barby1138 commented Sep 1, 2023

seems during refresh connection, the changes of networkservice declaration are not handled

But if I break existing connection
like in forwarder
vppctl set int state tap0 down

refresh connection heals it with propagated changes.

So seems during refresh connection procedure we need to check the change of networkservice declaration

hope it will help

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Sep 8, 2023

The problem looks interesting to me.

@edwarnicke

I see next solutions:

  1. We could rework discoverforwarder chain element to be able to refresh networkservice to match a new endpoint.
  2. [Preferred] We could add a possible for the NSMgr to monitor the network services and if ns is changed then the NSMgr could send an event to the NSC to schedule a Request to change endpoint immediately.
  3. Your option.

Please share your thoughts.

@barby1138
Copy link
Author

barby1138 commented Sep 12, 2023

Hi,

Seems idea with NSC is the best.
During connection refresh it should switch working in networkservice context not in connection one.

I will tell even more - in the current solution possible the next scenario:

  • network service was changed
  • all continues working as before
  • NSC crashes
  • suddenly insc-init takes new networkservice and connects like described there - leads to frustration.

Hope it helps.
Have a good day!!!

@barby1138
Copy link
Author

barby1138 commented Sep 24, 2023

Hi

Seems feature is not functioning as expected:

What was done:

After test onstall

network service: connects client to nse-1

apiVersion: v1
items:
- apiVersion: networkservicemesh.io/v1
  kind: NetworkService
  metadata:
  ...
  spec:
    matches:
    - routes:
      - destination_selector:
          app: nse-kernel-1
      source_selector: null
    payload: ETHERNET
  ...

client:

Defaulted container "app" out of: app, cmd-nsc, cmd-nsc-init (init)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: nsm-1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 02:fe:ff:7f:5e:ab brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.101/32 scope global nsm-1
       valid_lft forever preferred_lft forever
    inet6 fe80::fe:ffff:fe7f:5eab/64 scope link
       valid_lft forever preferred_lft foreve
972: eth0@if973: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:7a:b7:36:bc:ae brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.14/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ac7a:b7ff:fe36:bcae/64 scope link
       valid_lft forever preferred_lft forever

nse-1:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: kernel2ker-d2c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 02:fe:23:55:9d:52 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.100/32 scope global kernel2ker-d2c2
       valid_lft forever preferred_lft forever
    inet6 fe80::fe:23ff:fe55:9d52/64 scope link
       valid_lft forever preferred_lft forever
970: eth0@if971: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 4e:3e:c4:21:9d:1a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.145/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4c3e:c4ff:fe21:9d1a/64 scope link
       valid_lft forever preferred_lft forever

nse-2:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
974: eth0@if975: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether da:46:1f:8a:dc:92 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.98/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::d846:1fff:fe8a:dc92/64 scope link
       valid_lft forever preferred_lft forever

All is OK

Now edit network service so client connects to nse-2:

apiVersion: v1
items:
- apiVersion: networkservicemesh.io/v1
  kind: NetworkService
  metadata:
  ...
  spec:
    matches:
    - routes:
      - destination_selector:
          app: nse-kernel-2
      source_selector: null
    payload: ETHERNET
  ...

after ~40 sec

client:
Defaulted container "app" out of: app, cmd-nsc, cmd-nsc-init (init)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: nsm-1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 02:fe:0d:f4:0e:26 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.101/32 scope global nsm-1
       valid_lft forever preferred_lft forever
    inet 172.16.2.101/32 scope global nsm-1
       valid_lft forever preferred_lft forever
    inet6 fe80::fe:dff:fef4:e26/64 scope link
       valid_lft forever preferred_lft forever
972: eth0@if973: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:7a:b7:36:bc:ae brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.14/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ac7a:b7ff:fe36:bcae/64 scope link
       valid_lft forever preferred_lft forever

nse-1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
970: eth0@if971: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 4e:3e:c4:21:9d:1a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.145/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4c3e:c4ff:fe21:9d1a/64 scope link
       valid_lft forever preferred_lft forever

nse-2
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: kernel2ker-828b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 02:fe:e9:e4:26:7c brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.100/32 scope global kernel2ker-828b
       valid_lft forever preferred_lft forever
    inet 172.16.2.100/32 scope global kernel2ker-828b
       valid_lft forever preferred_lft forever
    inet6 fe80::fe:e9ff:fee4:267c/64 scope link
       valid_lft forever preferred_lft forever
974: eth0@if975: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether da:46:1f:8a:dc:92 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.98/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::d846:1fff:fe8a:dc92/64 scope link
       valid_lft forever preferred_lft forever

This is not what expected:

  1. client and nse-2 has 2 ips from nse-1 and nse-2 expected from nse-2 only
  2. It tooks too much time to switch

I think we need to react on networkservice change during connect refresh in nsc.
I see connection refresh is done often enough.
Just during refresh nsc should check if networksevice was changed and if yes start connections in the context of networkservice (new connection(s))
if not continue in context of existing connection(s) as it's done now.

Thanks.
Have a good day!!!

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Sep 24, 2023

@barby1138 Hello! 

Many thanks for testing!

Seems feature is not functioning as expected

Could you please attach the logs & deployments version and create a new ticket at https://github.com/networkservicemesh/deployments-k8s

I think we need to react on networkservice change during connect refresh in nsc.

This option is a bit slower than the current solution. I feel we could start with checking logs and return to this option if we find some blockers in the current solution.

@barby1138
Copy link
Author

Hi Denis

I use resent version ref=4ee0befafb18c3ad5355c1a9ec3ae8c10e3616b6

I use modified usecase kernel2kernel with 1 nsc and 2 nse started - networkservice directs nsc to connect nse-1 initially
then I modify networkservice to direct nsc to connect nse-2 (described above)
and then check atractors interfaces with "ip addr" command

Also from nsc logs I see connection refresh is done a bit more often than 40 sec. - actually - if I understand correctly - it happens every 2 sec.

have a good time!!!

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Sep 24, 2023

@barby1138 

Also from nsc logs I see connection refresh is done a bit more often than 40 sec. - actually - if I understand correctly - it happens every 2 seconds.

Hmm.. It seems like it's happening because of the monitoring that we added.

But it seems like something is going wrong with reselecting ednpoint and clearing the previous datapath information.

So, any logs from nsc could be helpful. Anyway, I'll take a look closer when we get a release candidate ;)

Thank you for testing, it's really useful.

@barby1138
Copy link
Author

Thank you and good luck!!!

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Sep 25, 2023

@barby1138 I've created a separate ticket for this issue, because looks like it's related to our healing feature. Lets continue discuss it there #9888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants