-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta SDS incorrectly sends an error response for an empty response #32832
Comments
cc @adisuissa @htuch |
This seems to be a dup of #24373. |
I can try to take this if it's not claimed |
@keithmattix I'll close this as dupe and assign you #24373 |
@howardjohn I think the Istio repro in the original description is expected. If the secret is not supplied by the SDS server, then the client should indeed error, no? |
a NACK from lack of resource is not expected. it should stay warming. but even if we had previously sent it, any removal is just entirely rejected |
Right, I've confirmed this and am iterating on the linked PR; I'm looking to verify the repro behavior with Istio
Ah I see; from the spec (emphasis mine):
So you're saying the desired behavior is:
I agree with 1 but on 2, how can the SDS subscription guarantee that the cluster/listener are being updated? What if the server is saying "this secret no longer exists" (e.g. a cert rotation failed); what should that do to the owning cluster/listener? Does the SDS subscription need to do anything at all (I think this is current behavior)? |
IMO I think the delete should be ignored entirely -- will try to explain
more in a bit or you can look at the recent ecds issue
…On Wed, Mar 20, 2024, 8:54 PM Keith Mattix II ***@***.***> wrote:
but even if we had previously sent it, any removal is just entirely
rejected
Right, I've confirmed this and am iterating on the linked PR; I'm looking
to verify the repro behavior with Istio
a NACK from lack of resource is not expected. it should stay warming.
Ah I see; from the spec
<https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/discovery/v3/discovery.proto#envoy-v3-api-field-service-discovery-v3-deltadiscoveryresponse-removed-resources>
(emphasis mine):
Resources names of resources that have be deleted and to be removed from
the xDS Client. *Removed resources for missing resources can be ignored*.
So you're saying the desired behavior is:
1. if cluster is still warming and secret is removed, the SDS
subscription should ignore the removal (NOT NACK)
2. If cluster is already ready (and serving traffic), the SDS
subscription should also ignore that removal because the owning
listener/cluster should be updated as well
I agree with 1 but on 2, how can the SDS subscription guarantee that the
cluster/listener are being updated? What if the server is saying "this
secret no longer exists" (e.g. a cert rotation failed); what should that do
to the owning cluster/listener? Does the SDS subscription need to do
anything at all (I think this is current behavior)?
—
Reply to this email directly, view it on GitHub
<#32832 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXIM7SHLFUTTAPOOBW3YZHSOPAVCNFSM6AAAAABERG5OGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGUYDKNBWHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'm familiar with the ecds issue; I'm just not sure which is the more "correct" approach from a protocol perspective. What you're suggesting would mean any removal that isn't LDS or CDS essentially has no effect. From a pure protocol perspective, I would think that an xDS server should be able to indicate that dependent resources are missing/removed and have that mean something |
@keithmattix IMO there are 2 usable implementations:
In (2), we would just never send Remove for dependent types. But if we just do that today, then we cannot tell Envoy a resource it requested is not available, which means it will be warming instead of just failing which is unexpected. |
Ah I see; unless we xDS adopted (2)'s semantics, we wouldn't be able to express "not available". Fair point; I ended up coding up "delete ignored" in #32961, so I think this will solve Istio's issue |
Ignoring delete can be dangerous in case of compromised secrets. Does delete trigger an eventual release of the secret? Or will it potentially be used indefinitely after the deletion? |
Wouldn't that require an update instead of a removal though? Of the cluster is not supposed to use a secret in the interim period before a new secret is available, modify the cluster right? |
@keithmattix I think it depends. I am mostly dis-agreeing with the base assumption that traffic must not be disrupted at all costs. Sometimes, you do want to break traffic to mitigate an ongoing attack, or for compliance reasons. |
Sure, but IIUC, not ignoring removal leaves control planes in a tough spot because they can't signal to the data plane that a requested secret is not/available without breaking traffic. If the CP wants to break traffic, it can just remove the secret from the cluster right? |
My take here is that semantics today are as pointed out above, we don't support removal of singleton subscriptions (SDS, EDS, etc.) independent of the referencing resource (cluster, listener). To @kyessenov point about dangers, we can either rely on the control plane to enforce the correct behavior during secret removal or make a change in semantics as suggested by @howardjohn. It seems reasoable to rely on control plane updating cluster/listener in these cases? Reopening as this is not resolved. |
Fixes #24373 and #32832 Risk Level: Low Testing: Manual testing with the Istio scenario described in #32832. Investigating how to add a unit test Release Notes: Delta SDS removals will no longer result in a "Missing SDS resources" error message Signed-off-by: Keith Mattix II <[email protected]>
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
Fixes envoyproxy#24373 and envoyproxy#32832 Risk Level: Low Testing: Manual testing with the Istio scenario described in envoyproxy#32832. Investigating how to add a unit test Release Notes: Delta SDS removals will no longer result in a "Missing SDS resources" error message Signed-off-by: Keith Mattix II <[email protected]>
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Title: Delta SDS incorrectly sends an error response for an empty response
Description:
Control plan sends
resources:[], removed:[some resource]
. Envoy rejects with2024-03-11T23:35:13.641010Z warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138 gRPC config for type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret rejected: Missing SDS resources for kubernetes://sds-credential in onConfigUpdate() thread=25
This comes from
envoy/source/common/secret/sds_api.cc
Line 157 in 708fa7b
Repro steps:
I don't have a trivial Envoy SDS setup for this in pure envoy, but its pretty simple in Istio. Just create something like
With istio latest (which uses delta xds now)
The text was updated successfully, but these errors were encountered: