Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netsvcmonitor chain element: nse doesn't match with networkservice #1521

Closed
LionelJouin opened this issue Oct 3, 2023 · 8 comments
Closed
Assignees
Labels
ASAP The issue that blocking SOW items or core use-cases

Comments

@LionelJouin
Copy link
Member

Expected Behavior

/

Current Behavior

The NSC manages to connect the the NSE, but the interfaces are constantly recreated which adds a lot of traffic outage/disturbance.

Disabling the netsvcmonitor chain element feature from this PR fixes the issue:
#1510

Failure Information (for bugs)

I get the error message coming from here: https://github.com/networkservicemesh/sdk/blob/v1.11.0-rc.1/pkg/networkservice/common/netsvcmonitor/server.go#L112

I believe it might be related to the Find function since the network service here and here are not corresponding for some reason.

Steps to Reproduce

/

Context

Tested with NSM v1.11.0-rc.1 and custom NSE and NSC.
I didn't had any problem with NSM 1.10.0 and NSE/NSC compiled with SDK v1.11.0-rc.1

Failure Logs

@denis-tingaikin denis-tingaikin added the ASAP The issue that blocking SOW items or core use-cases label Oct 3, 2023
@denis-tingaikin denis-tingaikin self-assigned this Oct 3, 2023
@denis-tingaikin
Copy link
Member

denis-tingaikin commented Oct 3, 2023

We also can't exclude that both NSE and NS actually don't match, and I think we should make sure that they're matching. 

So lets start with checking that.

For example, we could print nse/ns here:  
https://github.com/networkservicemesh/sdk/blob/v1.11.0-rc.1/pkg/networkservice/common/netsvcmonitor/server.go#L112

logger.Warnf("nse %v doesn't match with networkservice: %v", nses[0], netsvc)

Alternatively, you could simply retrieve this information from crd.

kubectl get nses -A
kubectl get netsvc -A

Also, any senseful information could be replaced with something like 'NSE{ Name:  "A", Labels "1, 2"} ',  'NS{Name: "B"} and so on to avoid NDA problems.

@LionelJouin
Copy link
Member Author

Yes, I printed the NSE and NS name at the place you mentioned, and they does not correspond.
The NS returned by the Find function is not corresponding to the query, so I believe it might be due to the Find implementation which doesn't handle the parameter correctly.

@denis-tingaikin
Copy link
Member

Do you have an example?

As I can see it's

  1. Do find with {Name="ns-1"}

Actual: Response is {Name="ns-12" ...}
Expected: Response is {Name="ns-1" ...}

@bellycat77 bellycat77 moved this to In Progress in Release v1.11.0 Oct 4, 2023
@LionelJouin
Copy link
Member Author

Right, this is what happens

@LionelJouin
Copy link
Member Author

$ kubectl get netsvc -n nsm conduit-a-1.trench-a.red -o yaml
apiVersion: networkservicemesh.io/v1
kind: NetworkService
metadata:
  creationTimestamp: "2023-10-04T09:49:54Z"
  generation: 1
  name: conduit-a-1.trench-a.red
  namespace: nsm
  resourceVersion: "2473"
  uid: 8d3e6c12-2edf-425c-8959-8df136e4995f
spec:
  name: conduit-a-1.trench-a.red
  path_ids:
  - spiffe://example.org/ns/red/sa/meridio-fes
  - spiffe://example.org/ns/nsm/sa/nsmgr-sa
  - spiffe://example.org/ns/nsm/sa/registry-k8s-sa
  payload: ETHERNET
$ kubectl get netsvc -n nsm proxy.conduit-a-1.trench-a.red -o yaml
apiVersion: networkservicemesh.io/v1
kind: NetworkService
metadata:
  creationTimestamp: "2023-10-04T09:49:57Z"
  generation: 1
  name: proxy.conduit-a-1.trench-a.red
  namespace: nsm
  resourceVersion: "2489"
  uid: 6b081004-1ec8-41e6-adb5-4e300d70ab31
spec:
  matches:
  - routes:
    - destination_selector:
        nodeName: '{{.nodeName}}'
    source_selector:
      nodeName: '{{.nodeName}}'
  name: proxy.conduit-a-1.trench-a.red
  path_ids:
  - spiffe://example.org/ns/red/sa/default
  - spiffe://example.org/ns/nsm/sa/nsmgr-sa
  - spiffe://example.org/ns/nsm/sa/registry-k8s-sa
  payload: ETHERNET
$ kubectl get nse -n nsm stateless-lb-frontend-attractor-a-1-85675855-w5bbp -o yaml
apiVersion: networkservicemesh.io/v1
kind: NetworkServiceEndpoint
metadata:
  creationTimestamp: "2023-10-04T09:49:54Z"
  generation: 7
  name: stateless-lb-frontend-attractor-a-1-85675855-w5bbp
  namespace: nsm
  resourceVersion: "3020"
  uid: 759958ca-beb2-4f0f-82b5-43ee9834bbd9
spec:
  expiration_time:
    nanos: 496779739
    seconds: 1696413296
  initial_registration_time:
    nanos: 309464413
    seconds: 1696412994
  name: stateless-lb-frontend-attractor-a-1-85675855-w5bbp
  network_service_labels:
    conduit-a-1.trench-a.red:
      labels:
        nodeName: kind-worker
  network_service_names:
  - conduit-a-1.trench-a.red
  path_ids:
  - spiffe://example.org/ns/red/sa/meridio-fes
  - spiffe://example.org/ns/nsm/sa/nsmgr-sa
  - spiffe://example.org/ns/nsm/sa/registry-k8s-sa
  url: tcp://10.244.2.4:5001
2023-10-04T09:51:26.701447012Z stderr F Oct  4 09:51:26.701�[33m [WARN] [id:4c39cb77-6169-4f91-9faf-f17d6f5482ec] [monitorServer:Find] [type:networkService] �[0m(11.3)             nse stateless-lb-frontend-attractor-a-1-85675855-w5bbp doesn't match with networkservice: conduit-a-1.trench-a.red

and in https://github.com/networkservicemesh/sdk/blob/v1.11.0-rc.1/pkg/networkservice/common/netsvcmonitor/server.go#L112

netsvc.GetNetworkService().GetName() = proxy.conduit-a-1.trench-a.red
conn.GetNetworkService() = conduit-a-1.trench-a.red

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Oct 4, 2023

Cool! Thanks!

I found the root cause - networkservicemesh/api#51

Reproduced & fixed: #1523

Image for testing: tinden/cmd-nsmgr:fix1521

@LionelJouin if you have a chance could you please check tinden/cmd-nsmgr:fix1521?

@LionelJouin
Copy link
Member Author

LionelJouin commented Oct 4, 2023

Yes, it works fine. Thank you very much

@bellycat77 bellycat77 moved this from In Progress to Under review in Release v1.11.0 Oct 5, 2023
@bellycat77 bellycat77 moved this from Under review to Done in Release v1.11.0 Oct 6, 2023
@denis-tingaikin
Copy link
Member

Seems like done for now. Closing. Feel free to reopen if we missed something ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASAP The issue that blocking SOW items or core use-cases
Projects
Status: Done
Development

No branches or pull requests

2 participants