Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.17] SDN-4919,OCPBUGS-39200: 4.18 merge - 5th Sept #2291

Open
wants to merge 52 commits into
base: release-4.17
Choose a base branch
from

Conversation

martinkennelly
Copy link
Contributor

/cc
/hold

TBD

arghosh93 and others added 30 commits August 9, 2024 17:21
This is to change POD and join subnet used with couple of net-attach-def
in unit tests to satisfy newly introduced subnet overlap check with
ClusterNetwork, ServiceNetwork, join switch and masquerade CIDR.

Signed-off-by: Arnab Ghosh <[email protected]>
UDN API referance generated using the following command:
  crd-ref-docs --source-path ./go-controller/pkg/crd/userdefinednetwork --config=crd-docs-config.yaml --renderer=markdown --output-path=./docs/api-reference/userdefinednetwork-api-spec.md

Signed-off-by: Or Mergi <[email protected]>
The new OVS version is used by the OVN observability.

Signed-off-by: Nadia Pinaeva <[email protected]>
Signed-off-by: Surya Seetharaman <[email protected]>
UDN: Add `MASQUERADE` IPTable Rules
OCPBUGS-38270: Dockerfile: Bump OVS to 3.4.0-1
UDN: allow multiple conditions from different fieldManagers to co-exist in the status.
…nagement-port

UDN: Add RPFilter Loose Mode for management port
Everytime a UDN was created, we were adding the all remote nodes for
every network all over again, including the default network. This makes
the checks on the annotations network aware.

Signed-off-by: Tim Rozet <[email protected]>
Services controller:
- move it to base network controller
- start one services controller per primary network
- set up filter in the informer so that only endpointslices for the given network are considered
- pass switch and router names according to the network for a given node.

Move getActiveNetworkForNamespace to CommonNetworkControllerInfo, because the services controller only has access to CommonNetworkControllerInfo at initialization and needs to run getActiveNetworkForNamespace.

Make LBs and LB groups network scoped

Add network name & role to OVN external IDs. In a few places in the code we retrieve all logical switches, routers and load balancers to initialize the services controller or to delete stale entries. With one services controller per network, the OVN lookup must only return OVN elements in the network we're interested in. This is achieved by adding the network name and network role (default, primary, secondary) to the ExternalIDs field of logical switches, routers and load balancers.

Signed-off-by: Riccardo Ravaioli <[email protected]>
The existing unit tests for services in services_controller_test are now run for UDN as well.

At the same time, a cleanup of unit tests was needed, especially since there was a lot of repetition in the surrounding code, also with respect to global and test-specific variables between services_controller_test.go and lb_config_test.go

Finally, Test_ETPCluster_NodePort_Service_WithMultipleIPAddresses follows the exact same logic found in TestSyncServices, so let's move it there

Signed-off-by: Riccardo Ravaioli <[email protected]>
Allows the execution of the network segmentation tests that are in network_segmentation_*.go (e.g. services, endpoint slice mirrorring). For instance:

make control-plane WHAT="Network Segmentation: services"

Signed-off-by: Riccardo Ravaioli <[email protected]>
The test creates a client and nodeport service in a UDN backed by one pod and similarly
a nodeport service and a client in the default network.
We verify that:
- UDN client --> UDN service, with backend pod and client running on the same node, is possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

- UDN client --> UDN service, with backend pod and client running on different nodes, is possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

- default-network client --> UDN service is NOT possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

-  UDN service --> default-network client is NOT possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

(*) TODO connect to other nodes too once ovnkube-node fully supports UDN

TODO: use the same logic as in network_segmentation.go

Signed-off-by: Riccardo Ravaioli <[email protected]>
Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Use faked iptables in UDN gateway tests
Update Dockerfile.fedora to use pre-released 24.09 ovn rpm.
Fixes remote node checks to be network aware
UDN layer 3 networks also have a join switch and gateway router.

Signed-off-by: Dumitru Ceara <[email protected]>
In the "delete" case we don't need the cookie, move the code that builds
the cookie after the section that checks and takes care of deletes.

Signed-off-by: Dumitru Ceara <[email protected]>
… namespace active network

Signed-off-by: Dumitru Ceara <[email protected]>
@martinkennelly
Copy link
Contributor Author

/test e2e-gcp-ovn-techpreview

Init issue

@martinkennelly
Copy link
Contributor Author

/retest

no issue that seem related to this PR

@martinkennelly martinkennelly changed the title [release-4.17] SDN-4919,OCPBUGS-39200: UDN Merge + OVS bump 5th Sept [release-4.17] SDN-4919,OCPBUGS-39200: 4.18 merge - 5th Sept Sep 5, 2024
@martinkennelly
Copy link
Contributor Author

Get the following unit test failures:

Summarizing 3 Failures:

[Fail] Gateway Init Operations Setting up the gateway bridge [It] sets up a local gateway with predetermined interface 
/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node/gateway_init_linux_test.go:1269

[Fail] Gateway Init Operations Setting up the gateway bridge [It] sets up a local gateway with predetermined interface when network-segmentation is enabled 
/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node/gateway_init_linux_test.go:1269

[Fail] Gateway Init Operations Setting up the gateway bridge [It] sets up a local gateway with predetermined interface and no default route 
/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node/gateway_init_linux_test.go:1269

Ran 155 of 155 Specs in 78.177 seconds
FAIL! -- 152 Passed | 3 Failed | 0 Pending | 0 Skipped

Will investigate tomorrow.

@tssurya
Copy link
Contributor

tssurya commented Sep 6, 2024

/retest

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6-techpreview

@martinkennelly
Copy link
Contributor Author

/payload 4.17 ci blocking
/payload 4.17 nightly blocking

Copy link
Contributor

openshift-ci bot commented Sep 6, 2024

@martinkennelly: trigger 4 job(s) of type blocking for the ci release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cf631b60-6c35-11ef-8dc0-278a6fec7ca9-0

trigger 9 job(s) of type blocking for the nightly release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.17-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cf631b60-6c35-11ef-8dc0-278a6fec7ca9-1

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

Test passed but exceeded 4 hours and got killed.

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2024-09-06T10:46:20Z"}
INFO[2024-09-06T10:46:20Z] Received signal.                              signal=interrupt
INFO[2024-09-06T10:46:20Z] error: Process interrupted with signal interrupt, cancelling execution... 

@martinkennelly
Copy link
Contributor Author

/test 4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade

ditto

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-techpreview

nothing indicating this PR is causing failure.

@martinkennelly
Copy link
Contributor Author

payload jobs are good

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Sep 6, 2024

Metal tech preview continues to fail in the same manner - nothing obvious sticking out.

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

INFO[2024-09-06T15:21:31Z] Running step e2e-azure-ovn-upgrade-ipi-deprovision-deprovision. {"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2024-09-06T15:23:16Z"}

No tests failed.

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

{ ["e2e-azure-ovn-upgrade" pod "e2e-azure-ovn-upgrade-openshift-e2e-test" failed: could not watch pod: context canceled Link to step on registry info site: https://steps.ci.openshift.org/reference/openshift-e2e-test Link to job on registry info site: https://steps.ci.openshift.org/job?org=openshift&repo=ovn-kubernetes&branch=release-4.17&test=e2e-azure-ovn-upgrade, cancelled]}

@tssurya
Copy link
Contributor

tssurya commented Sep 10, 2024

/hold
don't merge this till we get the bug in services fixed (contact Ricky for details, martin)

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

Test reach timelimit of 4 hours. No failures reported but unsure if its just unreported.
Looked at ovn-k and no issues seen in mg logs.

@martinkennelly
Copy link
Contributor Author

Waiting for ovn-kubernetes/ovn-kubernetes#4734

@jluhrsen
Copy link
Contributor

/retest

@jluhrsen
Copy link
Contributor

jluhrsen commented Nov 1, 2024

/retest

Copy link
Contributor

openshift-ci bot commented Nov 1, 2024

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade 129a097 link true /test e2e-azure-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-techpreview 129a097 link false /test e2e-metal-ipi-ovn-techpreview
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview 129a097 link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/security 129a097 link false /test security
ci/prow/e2e-aws-ovn-kubevirt 129a097 link false /test e2e-aws-ovn-kubevirt

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jluhrsen
Copy link
Contributor

jluhrsen commented Nov 4, 2024

the hypershift-conformance-techpreview job is new and we hope that it can pass, but I think it has some permafailing
tests that we may need to dig in to. I started a slack thread here.

@jluhrsen
Copy link
Contributor

jluhrsen commented Nov 4, 2024

/test e2e-azure-ovn-upgrade

looks like the last run was good with e2e, but some gather step had trouble. let's see what the next run looks like.

@jluhrsen
Copy link
Contributor

jluhrsen commented Nov 4, 2024

/payload 4.17 ci blocking
/payload 4.17 nightly blocking

Copy link
Contributor

openshift-ci bot commented Nov 4, 2024

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/83f10a70-9af2-11ef-903c-82b7c58aa129-0

trigger 9 job(s) of type blocking for the nightly release of OCP 4.17

  • periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.17-fips-payload-scan
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/83f10a70-9af2-11ef-903c-82b7c58aa129-1

@tssurya
Copy link
Contributor

tssurya commented Nov 5, 2024

@jluhrsen hmm why are we trying to get this merge in? We should just straight get your opened 4.17 merge in...this introduces the bug which we don't want to introduce...

@jluhrsen
Copy link
Contributor

jluhrsen commented Nov 5, 2024

@jluhrsen hmm why are we trying to get this merge in? We should just straight get your opened 4.17 merge in...this introduces the bug which we don't want to introduce...

@tssurya, that's fine. we can close this and move on with mine. I figured this one was very close to being good with the team though (passing CI, etc) and would effectively make my other PR built on top of this much smaller.

I also did not know about "the bug" this would introduce.

let's close this and I'll focus my attention on #2335

BTW, I don't have permission to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.