Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-36753,OCPBUGS-36754,OCPBUGS-36755OCPBUGS-36756: [release-4.14] Critical Bugs #961

Merged
merged 10 commits into from
Jul 10, 2024

Commits on Jul 9, 2024

  1. Remove LastState parameter of GenericPlugin

    The generic plugin was applying config changes only if
    the desired spec of interfaces was different from the last
    applied spec. This logic is different from the one in
    OnNodeStateChange where the real status of the interfaces is
    used to detect changes.
    
    By removing the LastState parameter (and related code), the
    generic plugin will also use the real status of interfaces
    to decide whether to apply changes or not. The SyncNodeState
    function has this logic.
    mlguerrero12 authored and zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    af0cdfe View commit details
    Browse the repository at this point in the history
  2. Verify status changes on managed interfaces

    Users could modify the settings of VFs which have been
    configured by the sriov operator. This PR starts the
    reconciliation loop when these changes are detected in the
    generic plugin.
    
    Signed-off-by: Marcelo Guerrero <[email protected]>
    mlguerrero12 authored and zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    4892490 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f843a58 View commit details
    Browse the repository at this point in the history
  4. Abstract logic to check and load missing KArgs

    Logic to check missing kernel arguments is placed in a method
    to be used by both OnNodeStateChange and CheckStatusChanges.
    mlguerrero12 authored and zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    6006854 View commit details
    Browse the repository at this point in the history
  5. Avoid reconciling field Webhook.ClientConfig.CABundle

    Webhook resources (`ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration`) in OpenShift
    are configured with `service.beta.openshift.io/inject-cabundle` in a way that
    a third component fills the ClientConfig.CABundle field of the webhook.
    When reconciling webhooks, do not override the field and avoid a flakiness, as
    there might be a time slot in which the API server is not configured with a valid
    client certificate:
    
    ```
    Error from server (InternalError): error when creating "policies": Internal error occurred: failed calling webhook "operator-webhook.sriovnetwork.openshift.io": failed to call webhook: Post "https://operator-webhook-service.openshift-sriov-network-operator.svc:443/mutating-custom-resource?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority
    ```
    
    The same behavior also happens when using CertManager
    
    Refs:
    - https://docs.openshift.com/container-platform/4.15/security/certificates/service-serving-certificate.html
    - https://issues.redhat.com/browse/OCPBUGS-32139
    - https://cert-manager.io/docs/concepts/ca-injector/
    
    Signed-off-by: Andrea Panattoni <[email protected]>
    zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    c0b6366 View commit details
    Browse the repository at this point in the history
  6. add sort for the policy on the sriovOperatorConfig controller

    we need to be consistent with the policy order
    
    Signed-off-by: Sebastian Sch <[email protected]>
    SchSeba authored and zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    cff6523 View commit details
    Browse the repository at this point in the history
  7. Fix issue with infinite reconciling higher MTU

    When the MTU set in the SRIOV Network Node Policy is lower than the
    actual MTU of the PF, it triggers the reconcile loop for the Node state
    indefinitely, preventing the configuration from completing.
    
    Signed-off-by: amaslennikov <[email protected]>
    almaslennikov authored and zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    291eb84 View commit details
    Browse the repository at this point in the history
  8. Avoid reconfiguring unmentioned DPDK VFs

    If a Virtual Function is configured with a DPDK driver (e.g. `vfio-pci`) and it is not
    referred by any SriovNetworkNodePolicy, `NeedToUpdateSriov` function must not
    trigger a  reconfiguration. This may happen if a PF is configured by multiple policies
    (via PF partitioning) and a policy is deleted by the user. In these cases, the VF is not
    reconfigured [1] and a drain loop is started
    
    The same logic applies to VDPA devices.
    
    refs:
    [1] https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/5f3c4e903f789aa177fe54686efd6c18576b7ab1/pkg/host/internal/sriov/sriov.go#L457
    
    Signed-off-by: Andrea Panattoni <[email protected]>
    zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    22a7dda View commit details
    Browse the repository at this point in the history
  9. Use interface index instead of name

    It's possible to have a race in the VFIsReady function. vf netdevice can
    have a default eth0 device name and be the time we call the netlink
    syscall to get the device information eth0 can be a different device.
    
    this cause duplicate mac allocation on vf admin mac address
    
    Signed-off-by: Sebastian Sch <[email protected]>
    SchSeba authored and zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    ce5d16d View commit details
    Browse the repository at this point in the history
  10. d/s: run make deps-update

    Signed-off-by: Andrea Panattoni <[email protected]>
    zeeke committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    8f84438 View commit details
    Browse the repository at this point in the history