Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-36734,OCPBUGS-36733,OCPBUGS-36731,OCPBUGS-36730: 4.15 critical bugs #954

Merged
merged 10 commits into from
Jul 9, 2024

Commits on Jul 5, 2024

  1. Remove LastState parameter of GenericPlugin

    The generic plugin was applying config changes only if
    the desired spec of interfaces was different from the last
    applied spec. This logic is different from the one in
    OnNodeStateChange where the real status of the interfaces is
    used to detect changes.
    
    By removing the LastState parameter (and related code), the
    generic plugin will also use the real status of interfaces
    to decide whether to apply changes or not. The SyncNodeState
    function has this logic.
    mlguerrero12 authored and zeeke committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    99ad660 View commit details
    Browse the repository at this point in the history
  2. Verify status changes on managed interfaces

    Users could modify the settings of VFs which have been
    configured by the sriov operator. This PR starts the
    reconciliation loop when these changes are detected in the
    generic plugin.
    
    Signed-off-by: Marcelo Guerrero <[email protected]>
    mlguerrero12 authored and zeeke committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    785e646 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7c5573e View commit details
    Browse the repository at this point in the history
  4. Abstract logic to check and load missing KArgs

    Logic to check missing kernel arguments is placed in a method
    to be used by both OnNodeStateChange and CheckStatusChanges.
    mlguerrero12 authored and zeeke committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    3e67dc8 View commit details
    Browse the repository at this point in the history
  5. Avoid reconciling field Webhook.ClientConfig.CABundle

    Webhook resources (`ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration`) in OpenShift
    are configured with `service.beta.openshift.io/inject-cabundle` in a way that
    a third component fills the ClientConfig.CABundle field of the webhook.
    When reconciling webhooks, do not override the field and avoid a flakiness, as
    there might be a time slot in which the API server is not configured with a valid
    client certificate:
    
    ```
    Error from server (InternalError): error when creating "policies": Internal error occurred: failed calling webhook "operator-webhook.sriovnetwork.openshift.io": failed to call webhook: Post "https://operator-webhook-service.openshift-sriov-network-operator.svc:443/mutating-custom-resource?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority
    ```
    
    The same behavior also happens when using CertManager
    
    Refs:
    - https://docs.openshift.com/container-platform/4.15/security/certificates/service-serving-certificate.html
    - https://issues.redhat.com/browse/OCPBUGS-32139
    - https://cert-manager.io/docs/concepts/ca-injector/
    
    Signed-off-by: Andrea Panattoni <[email protected]>
    zeeke committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    2ec22a3 View commit details
    Browse the repository at this point in the history
  6. add sort for the policy on the sriovOperatorConfig controller

    we need to be consistent with the policy order
    
    Signed-off-by: Sebastian Sch <[email protected]>
    SchSeba authored and zeeke committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    f6405e3 View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2024

  1. Fix issue with infinite reconciling higher MTU

    When the MTU set in the SRIOV Network Node Policy is lower than the
    actual MTU of the PF, it triggers the reconcile loop for the Node state
    indefinitely, preventing the configuration from completing.
    
    Signed-off-by: amaslennikov <[email protected]>
    almaslennikov authored and zeeke committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    33ca6a2 View commit details
    Browse the repository at this point in the history
  2. Avoid reconfiguring unmentioned DPDK VFs

    If a Virtual Function is configured with a DPDK driver (e.g. `vfio-pci`) and it is not
    referred by any SriovNetworkNodePolicy, `NeedToUpdateSriov` function must not
    trigger a  reconfiguration. This may happen if a PF is configured by multiple policies
    (via PF partitioning) and a policy is deleted by the user. In these cases, the VF is not
    reconfigured [1] and a drain loop is started
    
    The same logic applies to VDPA devices.
    
    refs:
    [1] https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/5f3c4e903f789aa177fe54686efd6c18576b7ab1/pkg/host/internal/sriov/sriov.go#L457
    
    Signed-off-by: Andrea Panattoni <[email protected]>
    zeeke committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    ef873a8 View commit details
    Browse the repository at this point in the history
  3. Use interface index instead of name

    It's possible to have a race in the VFIsReady function. vf netdevice can
    have a default eth0 device name and be the time we call the netlink
    syscall to get the device information eth0 can be a different device.
    
    this cause duplicate mac allocation on vf admin mac address
    
    Signed-off-by: Sebastian Sch <[email protected]>
    SchSeba authored and zeeke committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    068ca52 View commit details
    Browse the repository at this point in the history
  4. d/s: run make deps-update

    Signed-off-by: Andrea Panattoni <[email protected]>
    zeeke committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    9c27b3f View commit details
    Browse the repository at this point in the history