Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reconfiguring unmentioned DPDK VFs #710

Merged

Conversation

zeeke
Copy link
Member

@zeeke zeeke commented Jun 10, 2024

If a Virtual Function is configured with a DPDK driver (e.g. vfio-pci) and it is not referred by any SriovNetworkNodePolicy, NeedToUpdateSriov function must not trigger a reconfiguration. This may happen if a PF is configured by multiple policies (via PF partitioning) and a policy is deleted by the user. In these cases, the VF is not reconfigured [1] and a drain loop is started

refs:
[1]

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jun 10, 2024

Pull Request Test Coverage Report for Build 9447901897

Details

  • 1 of 2 (50.0%) changed or added relevant lines in 1 file are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 39.677%

Changes Missing Coverage Covered Lines Changed/Added Lines %
api/v1/helper.go 1 2 50.0%
Files with Coverage Reduction New Missed Lines %
controllers/drain_controller.go 3 67.1%
Totals Coverage Status
Change from base Build 9402838972: 0.02%
Covered Lines: 5177
Relevant Lines: 13048

💛 - Coveralls

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jun 10, 2024

Pull Request Test Coverage Report for Build 9449952944

Details

  • 1 of 2 (50.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 39.707%

Changes Missing Coverage Covered Lines Changed/Added Lines %
api/v1/helper.go 1 2 50.0%
Totals Coverage Status
Change from base Build 9402838972: 0.05%
Covered Lines: 5181
Relevant Lines: 13048

💛 - Coveralls

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jun 10, 2024

Pull Request Test Coverage Report for Build 9450305059

Details

  • 1 of 2 (50.0%) changed or added relevant lines in 1 file are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 39.677%

Changes Missing Coverage Covered Lines Changed/Added Lines %
api/v1/helper.go 1 2 50.0%
Files with Coverage Reduction New Missed Lines %
controllers/drain_controller.go 3 67.1%
Totals Coverage Status
Change from base Build 9402838972: 0.02%
Covered Lines: 5177
Relevant Lines: 13048

💛 - Coveralls

api/v1/helper.go Outdated
// need to reset VF if it is not a part of a group and:
// a. has DPDK driver loaded
// b. has VDPA device
if !ingroup && vfStatus.VdpaType != "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmilleri @wizhaoredhat I'm thinking about removing also this case. Are you aware of any case in which a VF with VdpaType that is not referenced by any policy should be configured/reset by the operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeeke Vdpa devices should be created/delete with policies, they should not be on their own. IIUC with this change, we won't trigger a reconfiguration if vdpa device is not in a policy? If yes, it looks ok to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this if statement checks if a VF has Vdpa but it is not referenced by any policy. The current behavior is to try to reconcile it (not sure which changes are going to be done). I'm proposing to make the operator leave the device as is, as there is no policy that says it should be managed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jun 17, 2024

Pull Request Test Coverage Report for Build 9547721609

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.03%) to 38.646%

Files with Coverage Reduction New Missed Lines %
controllers/drain_controller.go 1 68.06%
api/v1/helper.go 1 72.26%
Totals Coverage Status
Change from base Build 9544543606: 0.03%
Covered Lines: 5177
Relevant Lines: 13396

💛 - Coveralls

zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this pull request Jun 18, 2024
Add an end-to-end test to verify removing a partial
SriovNetworkNodePolicy does not trigger any issue in
the operator's deployment.

Refs:
- https://issues.redhat.com/browse/OCPBUGS-34934
- k8snetworkplumbingwg#710

Signed-off-by: Andrea Panattoni <[email protected]>
Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

If a Virtual Function is configured with a DPDK driver (e.g. `vfio-pci`) and it is not
referred by any SriovNetworkNodePolicy, `NeedToUpdateSriov` function must not
trigger a  reconfiguration. This may happen if a PF is configured by multiple policies
(via PF partitioning) and a policy is deleted by the user. In these cases, the VF is not
reconfigured [1] and a drain loop is started

The same logic applies to VDPA devices.

refs:
[1] https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/5f3c4e903f789aa177fe54686efd6c18576b7ab1/pkg/host/internal/sriov/sriov.go#L457

Signed-off-by: Andrea Panattoni <[email protected]>
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jun 26, 2024

Pull Request Test Coverage Report for Build 9677744999

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.001%) to 40.376%

Files with Coverage Reduction New Missed Lines %
api/v1/helper.go 1 72.26%
Totals Coverage Status
Change from base Build 9667839529: 0.001%
Covered Lines: 5452
Relevant Lines: 13503

💛 - Coveralls

@adrianchiris adrianchiris merged commit 50d436d into k8snetworkplumbingwg:master Jul 1, 2024
13 checks passed
zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this pull request Jul 8, 2024
Add an end-to-end test to verify removing a partial
SriovNetworkNodePolicy does not trigger any issue in
the operator's deployment.

Refs:
- https://issues.redhat.com/browse/OCPBUGS-34934
- k8snetworkplumbingwg#710

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this pull request Jul 8, 2024
Add an end-to-end test to verify removing a partial
SriovNetworkNodePolicy does not trigger any issue in
the operator's deployment.

Refs:
- https://issues.redhat.com/browse/OCPBUGS-34934
- k8snetworkplumbingwg#710

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this pull request Jul 9, 2024
Add an end-to-end test to verify removing a partial
SriovNetworkNodePolicy does not trigger any issue in
the operator's deployment.

Refs:
- https://issues.redhat.com/browse/OCPBUGS-34934
- k8snetworkplumbingwg#710

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this pull request Jul 19, 2024
Add an end-to-end test to verify removing a partial
SriovNetworkNodePolicy does not trigger any issue in
the operator's deployment.

Refs:
- https://issues.redhat.com/browse/OCPBUGS-34934
- k8snetworkplumbingwg#710

Signed-off-by: Andrea Panattoni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants