Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake: allow-all/pod-to-world/http(s)-to-cilium-io fails #367

Closed
gandro opened this issue Jun 24, 2021 · 31 comments · Fixed by #558
Closed

flake: allow-all/pod-to-world/http(s)-to-cilium-io fails #367

gandro opened this issue Jun 24, 2021 · 31 comments · Fixed by #558
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake Issues tracking failing (integration or unit) tests.

Comments

@gandro
Copy link
Member

gandro commented Jun 24, 2021

Observed in #366 on EKS:

https://github.com/cilium/cilium-cli/pull/366/checks?check_run_id=2905226287

The flow logs indicate that a DNS request was made, but no TCP connection was ever established:

   [.] Action [allow-all/pod-to-world/http-to-www-google: cilium-test/client-7b7bf54b85-h6qvt (10.0.1.205) -> www-google-http (www.google.com:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://www.google.com:80" failed: command terminated with exit code 28
  📄 Matching flows for pod cilium-test/client-7b7bf54b85-h6qvt
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ⌛ Waiting (5s) for flows: Required flows not found yet
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ℹ️  SYN and(ip(src=10.0.1.205),tcp(dstPort=80),tcpflags(syn)) not found
  ✅ DNS request found at 0
  ✅ DNS response found at 3
  ❌ Flow validation failed for pod cilium-test/client-7b7bf54b85-h6qvt: 1 failures (first: 0, last: 3, matched: 2)

It's not clear why there was never any TCP outgoing connection from curl. Unfortunately we don't seem to collect any L7 flows which would give us more insight into what the DNS response was.

@gandro gandro added the area/CI Continuous Integration testing issue or flake label Jun 24, 2021
michi-covalent pushed a commit that referenced this issue Jun 25, 2021
Bring back the DNS visibility annotation (466e8f1) that got reverted in
PR #356. I reverted it because it depended on 07a161d which was causing
issue #355. I updated the commit so that it no longer depends on 07a161d.
Having DNS visibility might give us some additional info to debug #367.

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Michi Mutsuzaki <[email protected]>
michi-covalent pushed a commit that referenced this issue Jun 25, 2021
Bring back the DNS visibility annotation (466e8f1) that got reverted in
PR #356. I reverted the commit because it depended on 07a161d which was
causing issue #355. I modified the commit so that it no longer depends
on 07a161d. Having DNS visibility might give us some additional info to
debug #367.

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Michi Mutsuzaki <[email protected]>
@tklauser
Copy link
Member

michi-covalent added a commit that referenced this issue Jun 25, 2021
Use [jenkins.]cilium.io for FQDN tests to reduce external dependencies.

Ref: #367

Signed-off-by: Michi Mutsuzaki <[email protected]>
michi-covalent added a commit that referenced this issue Jun 29, 2021
Use [jenkins.]cilium.io for FQDN tests to reduce external dependencies.

Ref: #367

Signed-off-by: Michi Mutsuzaki <[email protected]>
michi-covalent added a commit that referenced this issue Jun 29, 2021
Use [jenkins.]cilium.io for FQDN tests to reduce external dependencies.

Ref: #367

Signed-off-by: Michi Mutsuzaki <[email protected]>
nbusseneau pushed a commit that referenced this issue Jul 1, 2021
Use [jenkins.]cilium.io for FQDN tests to reduce external dependencies.

Ref: #367

Signed-off-by: Michi Mutsuzaki <[email protected]>
@tklauser
Copy link
Member

tklauser commented Jul 2, 2021

#373 changed the connectivity tests to use *.cilium.io instead of *.google.com, but it seems like failures of this kind still occur, see e.g. https://github.com/cilium/cilium-cli/runs/2969863237.

@michi-covalent
Copy link
Contributor

#373 changed the connectivity tests to use *.cilium.io instead of *.google.com, but it seems like failures of this kind still occur, see e.g. https://github.com/cilium/cilium-cli/runs/2969863237.

this one is actually even worse, it looks like curl succeeded (no 28 error) but still no SYN packet 🤯

@michi-covalent
Copy link
Contributor

still happening with cilium.io: https://github.com/cilium/cilium-cli/actions/runs/1102848558

brb added a commit to cilium/cilium that referenced this issue Jun 20, 2023
To avoid the interference [1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Jun 22, 2023
To avoid the interference [1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Jun 27, 2023
To avoid the interference [1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Jul 10, 2023
To avoid the interference [1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Jul 19, 2023
To avoid the interference [1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Jul 21, 2023
This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Jul 21, 2023
This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Signed-off-by: Martynas Pumputis <[email protected]>
pchaigno pushed a commit to cilium/cilium that referenced this issue Jul 28, 2023
This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Signed-off-by: Martynas Pumputis <[email protected]>
pchaigno pushed a commit to cilium/cilium that referenced this issue Jul 28, 2023
This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Aug 2, 2023
[ upstream commit e695db5 ]

This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Aug 3, 2023
[ upstream commit e695db5 ]

This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Backport NB: disabled the endpoint routes feature in the 3rd
configuration, as the documented limitation was fixed only in >=v1.13.

Signed-off-by: Martynas Pumputis <[email protected]>
brb added a commit to cilium/cilium that referenced this issue Aug 3, 2023
[ upstream commit e695db5 ]

This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Backport NB: disabled the endpoint routes feature in the 3rd
configuration, as the documented limitation was fixed only in >=v1.13.

Signed-off-by: Martynas Pumputis <[email protected]>
dylandreimerink pushed a commit to cilium/cilium that referenced this issue Aug 3, 2023
[ upstream commit e695db5 ]

This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Signed-off-by: Martynas Pumputis <[email protected]>
aanm pushed a commit to cilium/cilium that referenced this issue Aug 3, 2023
[ upstream commit e695db5 ]

This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Backport NB: disabled the endpoint routes feature in the 3rd
configuration, as the documented limitation was fixed only in >=v1.13.

Signed-off-by: Martynas Pumputis <[email protected]>
aanm pushed a commit to cilium/cilium that referenced this issue Aug 3, 2023
[ upstream commit e695db5 ]

This commit adds an option IPsec e2e upgrade test. The test does the
following:

* Install the latest stable release version of Cilium.
* Run the CLI's connectivity tests.
* Upgrade Cilium to a PR version.
* Run the CLI's connectivity tests.
* Downgrade to the previous version.
* Run the CLI's connectivity tests.

After each connectivity test case we flush CT to avoid the interference
[1][2] after running L7 proxy tests.

[1]: cilium/cilium-cli#367
[2]: #17459

The test runs pods which established long-lived connections.
The test checks whether they are not interrupted during upgrade /
downgrade.

Backport NB: disabled the endpoint routes feature in the 3rd
configuration, as the documented limitation was fixed only in >=v1.13.

Signed-off-by: Martynas Pumputis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake Issues tracking failing (integration or unit) tests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants