Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flow validation breakage #485

Merged
merged 2 commits into from
Sep 1, 2021
Merged

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Aug 12, 2021

DNS exchange happens first, it's useful to have it validated even if
the actual test traffic fails to validate.

Retry flow matching again if all flow requirements can not be
met. Start the new round from the flow following the previous first
match. This effectively retries flow matching from different L4
connections, as the ephemeral source port found from the first match
is fixed for the remaining requirements. By skipping the previous
first match the match can succeed on a different L4 connection.

For example, if successful DNS reply from proxy is required, it may be
on a different UDP or TCP port than the first DNS resolution exchange
that may have failed.

This capability was inadvertently removed in #319.

Merge flow validation results for each flow requirement before returning.
Without this flow validation was reported successful when at least one
flow validation requirement was met, hiding failures in later requirements.

Fixes: #319

DNS exchange happens first, it's useful to have it validated even if
the actual test traffic fails to validate.

Signed-off-by: Jarno Rajahalme <[email protected]>
@jrajahalme jrajahalme requested a review from a team as a code owner August 12, 2021 16:55
@jrajahalme jrajahalme temporarily deployed to ci August 12, 2021 16:56 Inactive
@jrajahalme jrajahalme temporarily deployed to ci August 13, 2021 09:53 Inactive
@jrajahalme jrajahalme requested a review from ti-mo August 13, 2021 13:40
@jrajahalme jrajahalme added the kind/bug Something isn't working label Aug 13, 2021
Retry flow matching again if all flow requirements can not be
met. Start the new round from the flow following the previous first
match. This effectively retries flow matching from different L4
connections, as the ephemeral source port found from the first match
is fixed for the remaining requirements. By skipping the previous
first match the match can succeed on a different L4 connection.

Merge flow validation results for each flow requirement before returning.
Without this flow validation was reported successful when at least one
flow validation requirement was met, hiding failures in later requirements.

For example, if successful DNS reply from proxy is required, it may be
on a different UDP or TCP port than the first DNS resolution exchange
that may have failed.

This capability was inadvertently removed in #319.

Fixes: #319
Signed-off-by: Jarno Rajahalme <[email protected]>
@jrajahalme jrajahalme temporarily deployed to ci August 13, 2021 14:36 Inactive
@jrajahalme jrajahalme changed the title Validate DNS requirements first Fix flow validation breakage Aug 13, 2021
@ti-mo
Copy link
Contributor

ti-mo commented Aug 16, 2021

I never understood the code touched here, so not sure if I can give a comprehensive review. The problem this PR is trying to solve is also not clear. I'm not sure why an offset is still being used, I thought we would remove it completely.

Given a list of flows on the left and a list of requirements on the right, the flow matcher should return a single verdict. If not all of the requirements are met, the error returned should state which one. If DNS resolution was successful, the matcher should make it past the DNS resolution and fail to match establishing e.g. a TCP connection. Why does this need special treatment?

@jrajahalme
Copy link
Member Author

I never understood the code touched here, so not sure if I can give a comprehensive review. The problem this PR is trying to solve is also not clear. I'm not sure why an offset is still being used, I thought we would remove it completely.

typically the 1st DNS flow requirement (First) is a DNS request sent by the source. This request has some ephemeral source port. So when matching for the First requirement (e.g., TCP SYN) we do not know what the source port of the successful connection exchange is, so we are happy to a TCP SYN from any port. All other flow requirements in the same "set" (i.e., Last and Middle) also require the same ephemeral port. This was implemented to match the flow requirements on a single transport connection so that an RST from another TCP connection would not break the flow validation.

See this commit for details on ephemeral port tracking: 6819144

The same applies to UDP exchanges as well. When the flow requirement expects to see the successful DNS response with the final answer, this practically never exists on the first DNS exchange. Without sliding the offset past the last successfully matched First flow filter the matcher would never see any of the later DNS exchanges (with different ephemeral port numbers) so the final answer would never be matched. This is why it is essential to try the flow filters on successive L4 (UDP or TCP) exchanges, implemented by incrementing the offset past the last First match.

Given a list of flows on the left and a list of requirements on the right, the flow matcher should return a single verdict. If not all of the requirements are met, the error returned should state which one. If DNS resolution was successful, the matcher should make it past the DNS resolution and fail to match establishing e.g. a TCP connection. Why does this need special treatment?

Currently the code does not indicate which requirement fails. Prior to this PR the flow matcher ignored failing requirements after the 1st successful one. This was hiding failing DNS requirements after successful TCP requirements. Making DNS requirements the 1st one reveled this. This is fixed by merging the validation results earlier.

@jrajahalme
Copy link
Member Author

I'm not sure why an offset is still being used, I thought we would remove it completely.

The offset is an internal detail of the flow matcher that was previously spilled to the caller. This PR removes the offset parameter from the flow matcher prototype. Internally, offset is needed to find the L4 connection successfully matching all the requirements, given the ephemeral port tracking in 6819144.

@jrajahalme
Copy link
Member Author

Given a list of flows on the left and a list of requirements on the right, the flow matcher should return a single verdict.

matchAllFlowRequirements() does this by calling matchFlowRequirements() successively for each requirement.

If not all of the requirements are met, the error returned should state which one. If DNS resolution was successful, the matcher should make it past the DNS resolution and fail to match establishing e.g. a TCP connection. Why does this need special treatment?

Not sure what you mean with "special treatment" here. matchFlowRequirements() internally uses an index variable we call offset to implement its function, given the presence of ephemeral port tracking done by the L4 filters. matchAllFlowRequirements() needs to merge validation results returned by matchFlowRequirements() to form the single verdict it needs to return.

@jrajahalme
Copy link
Member Author

kind test failed with the same error as cilium/cilium#12141:

kind-chart-testing] Waiting for Service cilium-test/echo-same-node to become ready...
🐛 Error waiting for service cilium-test/echo-same-node: command terminated with exit code 1:
;; reply from unexpected source: 10.244.1.178#53, expected 10.245.0.10#53

;; reply from unexpected source: 10.244.1.178#53, expected 10.245.0.10#53

;; reply from unexpected source: 10.244.1.178#53, expected 10.245.0.10#53

;; connection timed out; no servers could be reached

@jrajahalme
Copy link
Member Author

Multicluster failed with #361

@jrajahalme
Copy link
Member Author

This is a bug fix and the flakes are unrelated, marking as ready-to-merge

@jrajahalme jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 1, 2021
@tklauser tklauser merged commit 088eb3f into master Sep 1, 2021
@tklauser tklauser deleted the pr/jrajahalme/dns-reqs-first branch September 1, 2021 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants