Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mirroring all services when remote selector is empty #11344

Merged
merged 4 commits into from
Sep 7, 2023

Conversation

mateiidavid
Copy link
Member

The multicluster link supports two selectors: one for normal (endpoint-based) mirrors, and one for remote-discovery where only service imports are created. When the remote-discovery selector is empty (i.e. an empty object '{}'), then all services in a cluster will be mirrored. The created imports also have an underlying Endpoints object created.

There are two distinct checks that decide whether a service import should be created: isExported and isRemote. When a selector is empty, the checks are shortcircuted and return early. The former check (isExported) additionally also checks if a service is remote, without checking if the corresponding selector is empty. This allows services to slip through since an empty selector encompasses everything.

The change fixes the issue by removing any remote discovery checks from isExported. Where necessary, we add another call to isRemote.

Fixes #11309


Reproducing the issue

  • Set-up two clusters and install the latest version of linkerd.
  • Create at least one service in the target cluster.
  • Link the two clusters and use an empty remote discovery selector.

The source cluster will now contain imports for all services in the target, including the kubernetes service.

NAME     SERVERS   AGENTS   LOADBALANCER
source   1/1       0/0      true
target   1/1       0/0      true

# Target cluster services
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.43.0.1     <none>        443/TCP   22m
podinfo      ClusterIP   10.43.74.37   <none>        80/TCP    18m
nginx-svc    ClusterIP   10.43.72.33   <none>        80/TCP    8m39s

# Link spec, notice empty selector
spec:
  clusterCredentialsSecret: cluster-credentials-target
  gatewayAddress: 192.168.224.4
  gatewayIdentity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
  gatewayPort: "4143"
  probeSpec:
    path: /ready
    period: 3s
    port: "4191"
  remoteDiscoverySelector: {}
  selector:
    matchLabels:
      mirror.linkerd.io/exported: "true"
  targetClusterDomain: cluster.local
  targetClusterLinkerdNamespace: linkerd
  targetClusterName: target

# One service is exported
:; k get svc podinfo -o yaml | rg 'labels' -A 5
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.kubernetes.io/topology-aware-hints":"auto"},"labels":{"mirror.linkerd.io/exported":"true"},"name":"podinfo","namespace":"default"},"spec":{"ports":[{"port":80,"targetPort":9898}],"selector":{"app":"podinfo"},"type":"ClusterIP"}}
    service.kubernetes.io/topology-aware-hints: auto
  creationTimestamp: "2023-09-06T14:58:29Z"
  labels:
    mirror.linkerd.io/exported: "true"

# The other isn't
:; k get svc nginx-svc -o yaml | rg 'labels' -A 3
<empty>

# All are replicated in the source cluster
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes          ClusterIP   10.43.0.1       <none>        443/TCP   6m52s
nginx-svc-target    ClusterIP   10.43.115.177   <none>        80/TCP    21s
podinfo-target      ClusterIP   10.43.120.163   <none>        80/TCP    20s
kubernetes-target   ClusterIP   10.43.81.214    <none>        443/TCP   16s

Testing

With the change in place, tested:

  • An empty selector does not trigger service creation
  • A specified remote-discovery selector can still be used and services are imported.
  • On deletion of the exported (actual) service, imported services are cleaned-up
# Apply change
NAME                                             READY   STATUS        RESTARTS   AGE
linkerd-gateway-59c55f65b6-ggxmh                 2/2     Running       0          8m4s
linkerd-service-mirror-target-7db69c6df5-4r9m6   2/2     Running       0          3s
linkerd-service-mirror-target-777b9fdb95-f7fvt   2/2     Terminating   0          3m43s

# Wait for new deployment to roll out
# Old state
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes          ClusterIP   10.43.0.1       <none>        443/TCP   10m
nginx-svc-target    ClusterIP   10.43.115.177   <none>        80/TCP    3m43s
podinfo-target      ClusterIP   10.43.120.163   <none>        80/TCP    3m42s
kubernetes-target   ClusterIP   10.43.81.214    <none>        443/TCP   3m38s

# New state a few seconds later
:; k get svc
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes       ClusterIP   10.43.0.1       <none>        443/TCP   10m
podinfo-target   ClusterIP   10.43.120.163   <none>        80/TCP    3m51s

# Re-create link
spec:
  clusterCredentialsSecret: cluster-credentials-target
  gatewayAddress: 192.168.224.4
  gatewayIdentity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
  gatewayPort: "4143"
  probeSpec:
    path: /ready
    period: 3s
    port: "4191"
  remoteDiscoverySelector:
    matchLabels:
      mirror.linkerd.io/exported: remote-discovery
  selector:
    matchLabels:
      mirror.linkerd.io/exported: "true"

# Label nginx-svc with remote-discovery
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes         ClusterIP   10.43.0.1       <none>        443/TCP   14m
podinfo-target     ClusterIP   10.43.120.163   <none>        80/TCP    8m5s
nginx-svc-target   ClusterIP   10.43.160.156   <none>        80/TCP    2s

# No endpoints created
NAME             ENDPOINTS            AGE
kubernetes       192.168.224.2:6443   14m
podinfo-target   192.168.224.4:4143   8m7s

# Delete nginx-svc in target cluster
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes         ClusterIP   10.43.0.1       <none>        443/TCP   27m
podinfo-target     ClusterIP   10.43.120.163   <none>        80/TCP    20m
nginx-svc-target   ClusterIP   10.43.160.156   <none>        80/TCP    12m

# Service mirror cleans it up
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes       ClusterIP   10.43.0.1       <none>        443/TCP   27m
podinfo-target   ClusterIP   10.43.120.163   <none>        80/TCP    21m

The multicluster link supports two selectors: one for normal
(endpoint-based) mirrors, and one for remote-discovery where only
service imports are created. When the remote-discovery selector is empty
(i.e. an empty object '{}'), then all services in a cluster will be
mirrored. The created imports also have an underlying Endpoints object
created.

There are two distinct checks that decide whether a service import
should be created: `isExported` and `isRemote`. When a selector is
empty, the checks are shortcircuted and return early. The former check
(`isExported`) additionally also checks if a service is remote, without
checking if the corresponding selector is empty. This allows services to
slip through since an empty selector encompasses everything.

The change fixes the issue by removing any remote discovery checks from
`isExported`. Where necessary, we add another call to `isRemote`.

Fixes #11309

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid requested a review from a team as a code owner September 6, 2023 15:22
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel a lot more confident about this change if we were able to test it...

How hard would it be to wire up an integration test that catches the original issue?

@mateiidavid
Copy link
Member Author

I'd feel a lot more confident about this change if we were able to test it...

How hard would it be to wire up an integration test that catches the original issue?

That's a great point, I should've added a test. They're trivial to write, they just don't follow the usual integration test patterns we've been using in cluster_watcher. I've added a set of tests that check the empty selector invariant holds in three different scenarios.

--- FAIL: TestEmptyRemoteSelectors (0.61s)
    --- FAIL: TestEmptyRemoteSelectors/empty_remote_discovery_selector_does_not_result_in_exports (0.20s)
    cluster_watcher_mirroring_test.go:80: Was expecting 1 events but got 2
    --- PASS: TestEmptyRemoteSelectors/empty_default_selector_does_not_result_in_exports (0.20s)
    --- PASS: TestEmptyRemoteSelectors/no_selector_in_link_does_not_result_in_exports (0.20s)

Before the change, one of the cases fails (remote discovery label when empty catches all services), which is expected. After the change it passes.

--- PASS: TestEmptyRemoteSelectors (0.61s)
    --- PASS: TestEmptyRemoteSelectors/empty_remote_discovery_selector_does_not_result_in_exports (0.20s)
    --- PASS: TestEmptyRemoteSelectors/empty_default_selector_does_not_result_in_exports (0.20s)
    --- PASS: TestEmptyRemoteSelectors/no_selector_in_link_does_not_result_in_exports (0.20s)

@mateiidavid mateiidavid mentioned this pull request Sep 7, 2023
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mateiidavid , LGTM 👍

@mateiidavid mateiidavid merged commit fd3d92a into main Sep 7, 2023
35 checks passed
@mateiidavid mateiidavid deleted the matei/fix-remote-disco-select branch September 7, 2023 16:39
adamshawvipps pushed a commit to adamshawvipps/linkerd2 that referenced this pull request Sep 18, 2023
* Fix mirroring all services when remote selector is empty

The multicluster link supports two selectors: one for normal
(endpoint-based) mirrors, and one for remote-discovery where only
service imports are created. When the remote-discovery selector is empty
(i.e. an empty object '{}'), then all services in a cluster will be
mirrored. The created imports also have an underlying Endpoints object
created.

There are two distinct checks that decide whether a service import
should be created: `isExported` and `isRemote`. When a selector is
empty, the checks are shortcircuted and return early. The former check
(`isExported`) additionally also checks if a service is remote, without
checking if the corresponding selector is empty. This allows services to
slip through since an empty selector encompasses everything.

The change fixes the issue by removing any remote discovery checks from
`isExported`. Where necessary, we add another call to `isRemote`.

Fixes linkerd#11309

Signed-off-by: Matei David <[email protected]>

* Add an integration test for empty selectors

Signed-off-by: Matei David <[email protected]>

* Differentiate test service ports

Signed-off-by: Matei David <[email protected]>

* @alpeb's feedback

Signed-off-by: Matei David <[email protected]>

---------

Signed-off-by: Matei David <[email protected]>
adamshawvipps pushed a commit to adamshawvipps/linkerd2 that referenced this pull request Sep 18, 2023
* Fix mirroring all services when remote selector is empty

The multicluster link supports two selectors: one for normal
(endpoint-based) mirrors, and one for remote-discovery where only
service imports are created. When the remote-discovery selector is empty
(i.e. an empty object '{}'), then all services in a cluster will be
mirrored. The created imports also have an underlying Endpoints object
created.

There are two distinct checks that decide whether a service import
should be created: `isExported` and `isRemote`. When a selector is
empty, the checks are shortcircuted and return early. The former check
(`isExported`) additionally also checks if a service is remote, without
checking if the corresponding selector is empty. This allows services to
slip through since an empty selector encompasses everything.

The change fixes the issue by removing any remote discovery checks from
`isExported`. Where necessary, we add another call to `isRemote`.

Fixes linkerd#11309

Signed-off-by: Matei David <[email protected]>

* Add an integration test for empty selectors

Signed-off-by: Matei David <[email protected]>

* Differentiate test service ports

Signed-off-by: Matei David <[email protected]>

* @alpeb's feedback

Signed-off-by: Matei David <[email protected]>

---------

Signed-off-by: Matei David <[email protected]>
Signed-off-by: Adam Shaw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linkerd-multicluster remoteDiscoverySelector default selects all remote services
3 participants