Auto-Restoring deleted ServiceImports and EndPointSlices #696

davidohana · 2022-03-02T08:48:48Z

Not sure whether to call this a bug or an enhancement.

What happened:

Deleting a service import or an endpoint slice locally - it stays deleted locally and deletion is synced to the broker.
Deleting a service import or an endpoint slice from the broker, nothing happen

What you expected to happen:

The agent shall recreate the service import / endpoint slice after some time as the "source of truth" is the export and it still exists. Currently this only happens after lighthouse agent restart.

Additional notes:

To compare, if I delete a k8s endpoint slice, it is recreated immediately.
I realize that those edge cases are only possible if someone with sufficient permissions delete those objects manually.

Slack discussion link: https://kubernetes.slack.com/archives/C010RJV694M/p1645966829687039

vthapar · 2022-03-02T11:15:16Z

@davidohana Could you clarify what you mean by deleting locally? I expect you mean deleting locally on the source cluster where export exists, but just wanted to be sure it is so.

davidohana · 2022-03-02T14:35:04Z

Yes, that what I mean - deleting on the origin cluster.

dfarrell07 · 2022-06-07T13:55:40Z

There are likely other objects that would fail in similar ways if a user deletes them, because they are internal objects that we don't expect users to mess with. To handle this properly we would need to make a more systematic review of the code base, which would likely be benefited by an EP and a broader discussion about what should change and how.

stale · 2022-10-15T18:35:40Z

This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.

dfarrell07 · 2022-10-18T13:16:25Z

@vthapar pointed out that finalizers may be a good, low-overhead way of doing this.

stale · 2023-02-18T05:04:22Z

This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.

dfarrell07 · 2023-02-21T14:12:53Z

Discussing this again, it's important to note that monitoring resources cross-many-clusters for deletion comes with overhead that doesn't exist for the parallel of an endpoint slices in one K8s cluster. If there's a usecase for this, we'd be happy to discuss that and prioritize this work. Again, this likely applies to other resources and finalizers might enable this with reasonable overhead.

This might be kinda-related to submariner-io/enhancements#161, or at least a good time to think about finalizers for various resources.

stale · 2023-08-13T07:42:12Z

This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.

tpantelis · 2023-10-02T11:51:56Z

While a finalizer would prevent out-of-band deletion, we wouldn't be able to update the resource since it's in the process of being deleted.

A simple solution would be to specify a resync period with the informers.

tpantelis · 2024-01-18T16:39:16Z

I tested finalizer behavior and and after deletion with a finalizer still present, the resource can still be updated. You just can't modify the DeletionTimestamp nor add another finalizer. However using a finalizer merely to prevent out-of-band deletion can be problematic if the resource isn't deleted by the controller that added the finalizer. So if Submariner isn't properly uninstalled, finalizers would remain and block namespace deletion requiring manual removal of the finalizers.

A safer solution is to recreate a resource on out-of-band deletion. We can do this fairly simply via the resource syncer. On deletion of a resource, check if the originating resource still exists and, if so, re-queue it so the deleted resource is re-created. If not then process the deleted resource normally. Eg, ServiceImports are created from ServiceExports, so on deletion of a ServiceImport check if the originating ServiceExport still exists in the resource syncer cache.

t0lya · 2024-02-15T20:52:56Z

My team runs a multi-cluster setup with Submariner's service discovery setup (v0.15.2), where we might be seeing a case of this issue.

Our workloads can be placed on particular cluster based on nodeSelectors in the Deployment spec. When a Deployment's nodeSelectors are updated in a way that triggers a cluster placement change, we have a controller that deletes Service/ServiceExport/Deployment from the old source cluster and immediately re-creates these 3 objects in new source cluster.

We are seeing that this results in the deletion of aggregated ServiceImport by submariner-lighthouse-agent running on old source cluster. submariner-lighthouse-agent running on new source cluster never re-creates the aggregated ServiceImport.

To mitigate the issue, we had to manually re-create Service/ServiceExport on the new cluster to trigger ServiceImport re-creation on the broker cluster.

@tpantelis Will your solution handle above scenario? And is there a timeline for when a solution will be rolled out?

github-actions · 2024-06-16T00:26:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further
activity occurs. Thank you for your contributions.

davidohana added the bug Something isn't working label Mar 2, 2022

dfarrell07 added enhancement New feature or request and removed bug Something isn't working labels Jun 7, 2022

stale bot added the wontfix This will not be worked on label Oct 15, 2022

stale bot removed the wontfix This will not be worked on label Oct 18, 2022

stale bot added the wontfix This will not be worked on label Feb 18, 2023

stale bot removed the wontfix This will not be worked on label Feb 21, 2023

dfarrell07 added wontfix This will not be worked on technical-debt labels Feb 21, 2023

stale bot removed the wontfix This will not be worked on label Feb 21, 2023

dfarrell07 added wontfix This will not be worked on priority:low and removed enhancement New feature or request wontfix This will not be worked on labels Feb 21, 2023

stale bot added the wontfix This will not be worked on label Aug 13, 2023

dfarrell07 assigned tpantelis Aug 22, 2023

dfarrell07 removed the wontfix This will not be worked on label Aug 22, 2023

github-actions bot added the stale label Jun 16, 2024

tpantelis removed the stale label Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-Restoring deleted ServiceImports and EndPointSlices #696

Auto-Restoring deleted ServiceImports and EndPointSlices #696

davidohana commented Mar 2, 2022 •

edited

Loading

vthapar commented Mar 2, 2022

davidohana commented Mar 2, 2022

dfarrell07 commented Jun 7, 2022

stale bot commented Oct 15, 2022

dfarrell07 commented Oct 18, 2022

stale bot commented Feb 18, 2023

dfarrell07 commented Feb 21, 2023

stale bot commented Aug 13, 2023

tpantelis commented Oct 2, 2023

tpantelis commented Jan 18, 2024

t0lya commented Feb 15, 2024 •

edited

Loading

github-actions bot commented Jun 16, 2024

Auto-Restoring deleted ServiceImports and EndPointSlices #696

Auto-Restoring deleted ServiceImports and EndPointSlices #696

Comments

davidohana commented Mar 2, 2022 • edited Loading

vthapar commented Mar 2, 2022

davidohana commented Mar 2, 2022

dfarrell07 commented Jun 7, 2022

stale bot commented Oct 15, 2022

dfarrell07 commented Oct 18, 2022

stale bot commented Feb 18, 2023

dfarrell07 commented Feb 21, 2023

stale bot commented Aug 13, 2023

tpantelis commented Oct 2, 2023

tpantelis commented Jan 18, 2024

t0lya commented Feb 15, 2024 • edited Loading

github-actions bot commented Jun 16, 2024

davidohana commented Mar 2, 2022 •

edited

Loading

t0lya commented Feb 15, 2024 •

edited

Loading