Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Validate Service IP #422

Merged
merged 1 commit into from
Aug 13, 2021
Merged

Conversation

jrajahalme
Copy link
Member

Validate resolved service IP when waiting for service to become
available.

This will help avoid test flakes where kube-dns is returning a stale
IP of a service that was just removed, e.g., when running test with
--force-deploy.

Fixes: cilium/cilium#16867
Signed-off-by: Jarno Rajahalme [email protected]

@jrajahalme jrajahalme requested a review from a team as a code owner July 13, 2021 13:33
@jrajahalme jrajahalme temporarily deployed to ci July 13, 2021 13:33 Inactive
@jrajahalme jrajahalme added the area/CI Continuous Integration testing issue or flake label Jul 13, 2021
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-wait-cluster-ip branch from 5a7e0e1 to 84af027 Compare July 14, 2021 11:57
@jrajahalme jrajahalme temporarily deployed to ci July 14, 2021 11:57 Inactive
@jrajahalme
Copy link
Member Author

Multi-cluster test fails due to expecting DNS to return the ClusterIP of the local cluster, presumably. Artifact collection is broken, so can't confirm:

2021-07-14T12:08:47.498637237Z 🐛 Error waiting for service cilium-test/echo-other-node: Service IP "10.100.4.13" not found in nslookup output "Server:\t\t10.112.0.10\nAddress:\t10.112.0.10:53\n\nNon-authoritative answer:\nName:\techo-other-node.cilium-test.svc.cluster.local\nAddress: 10.112.10.71\n\n** server can't find echo-other-node.svc.cluster.local: NXDOMAIN\n\n** server can't find echo-other-node.cluster.local: NXDOMAIN\n\n** server can't find echo-other-node.c.***.internal: NXDOMAIN\n\n** server can't find echo-other-node.google.internal: NXDOMAIN\n\n** server can't find echo-other-node.svc.cluster.local: NXDOMAIN\n\n** server can't find echo-other-node.cluster.local: NXDOMAIN\n\n** server can't find echo-other-node.c.***.internal: NXDOMAIN\n\n** server can't find echo-other-node.google.internal: NXDOMAIN\n\n": 

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-wait-cluster-ip branch from 84af027 to 065a943 Compare July 14, 2021 13:06
@jrajahalme jrajahalme temporarily deployed to ci July 14, 2021 13:06 Inactive
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-wait-cluster-ip branch from 065a943 to 223f2cc Compare July 14, 2021 14:42
@jrajahalme jrajahalme temporarily deployed to ci July 14, 2021 14:42 Inactive
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-wait-cluster-ip branch from 223f2cc to 531539d Compare July 16, 2021 11:37
@jrajahalme jrajahalme temporarily deployed to ci July 16, 2021 11:37 Inactive
@jrajahalme jrajahalme requested a review from tklauser July 16, 2021 11:37
@jrajahalme
Copy link
Member Author

re-running AWS CI jobs due to infra problems (check for AWS ds scheduled jobs == 0 failed).

@tklauser
Copy link
Member

re-running AWS CI jobs due to infra problems (check for AWS ds scheduled jobs == 0 failed).

The failure is not due to infra problems, but an oversight when #400 was merged. The fix will be in #442, so I think this is good to merge once @michi-covalent approves.

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-wait-cluster-ip branch from 531539d to 48a377e Compare July 19, 2021 20:29
@jrajahalme jrajahalme temporarily deployed to ci July 19, 2021 20:29 Inactive
Validate resolved service IP when waiting for service to become
available.

This will help avoid test flakes where kube-dns is returning a stale
IP of a service that was just removed, e.g., when running test with
`--force-deploy`.

Signed-off-by: Jarno Rajahalme <[email protected]>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-wait-cluster-ip branch from 48a377e to 9c200ba Compare August 12, 2021 14:53
@jrajahalme jrajahalme temporarily deployed to ci August 12, 2021 14:53 Inactive
@jrajahalme
Copy link
Member Author

rebased to resolve conflicts

@tklauser
Copy link
Member

AKS failed on #367 which is a known flake: https://github.com/cilium/cilium-cli/pull/422/checks?check_run_id=3312947463

Merging.

@tklauser tklauser merged commit 6dfdda8 into master Aug 13, 2021
@tklauser tklauser deleted the pr/jrajahalme/test-wait-cluster-ip branch August 13, 2021 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: ConformanceEKS pod-to-service test to echo-same-node fails
3 participants