Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How configuration change should be handled during healing? #9219

Closed
ljkiraly opened this issue Jun 1, 2023 · 2 comments
Closed

How configuration change should be handled during healing? #9219

ljkiraly opened this issue Jun 1, 2023 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@ljkiraly
Copy link
Contributor

ljkiraly commented Jun 1, 2023

Question

A custom NSC is used to connect to NSM v1.9.0. This NSC has a side-car which auto-generate the interface name configured on NSC. During a robustness test the following happened:

  • The NSMgr pod was restarted;
  • The custom NSC tries to close and reopen the connections (kernel2kernel and kernel2ethernet2kernel type connections);
  • NSMgr shortcut the close request and did not sent towards forwarder (forwarder-vpp);
  • When the NSC tries to 'reopen' the connection a new interface name was generated, different from the previous one;
  • Since the forwarder doesn't receive close for the previous connection this request was handled as a refresh ;
  • This sequence ended in two faulty NSC interfaces (two connections toward two network service, but interfaces are mixed up and also the cross-connect to the remote connection). Also the routing tables for policy based routing are empty.

Can you suggest something to solve this issue? The data-path healing is not an option here. Currently the forwarder tolerate the configuration change (interface name change) for a connection, can that be changed to return with some fault when some specific parameters changed for an established connection?

@denis-tingaikin denis-tingaikin added the question Further information is requested label Jun 5, 2023
@glazychev-art glazychev-art moved this from Todo to In Progress in Release v1.10.0 Jul 10, 2023
@glazychev-art
Copy link
Contributor

glazychev-art commented Jul 11, 2023

Hi @ljkiraly
Thanks for the report

Could you re-check this problem on the last main branch or on v1.10.0-rc.1 (it will be out soon)?
We have improved how the healing works and now it should cover this scenario - Close will be called on the forwarder

@glazychev-art glazychev-art moved this from In Progress to Under review in Release v1.10.0 Jul 11, 2023
@denis-tingaikin
Copy link
Member

@ljkiraly The problem should be fixed by networkservicemesh/sdk#1471

Now changes in datapath should work as you expect.

Feel free to test it with v1.10.0

If the problem is still actual, please feel free to re-open the ticket OR create another one if you find something else ;)

@github-project-automation github-project-automation bot moved this from Under review to Done in Release v1.10.0 Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: Done
Development

No branches or pull requests

3 participants