Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add switchdev-configuration-after-NM service #202

Merged
merged 1 commit into from
Dec 15, 2021

Conversation

zshi-redhat
Copy link
Collaborator

@zshi-redhat zshi-redhat commented Nov 15, 2021

Switchdev-configuration-after-NM service rebinds VF to its
driver and executes after NetworkManager service, this is
required for features such as VF LAG to take effect when
bond or other network configurtion are configured through
NetworkManager service.

Signed-off-by: Zenghui Shi [email protected]

@pliurh
Copy link
Collaborator

pliurh commented Nov 22, 2021

Is this a temporary workaround before we fix the issue in the kernel?

@zshi-redhat
Copy link
Collaborator Author

Is this a temporary workaround before we fix the issue in the kernel?

Not a temporary fix, this is required for VF LAG feature to function properly.

fi

# Required for NetworkManager configuration(e.g. bond) to settle down
sleep 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use nmcli or ip command to check the link state, instead of using sleep?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be easy since we don't know what configuration the nmcli or ip should wait. there could be cases that additional bond or other NM configuration is not configured by user.

Description=Binds SRIOV VFs into switchdev driver
# Removal of this file signals firstboot completion
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
# This service is used to move a SRIOV NIC into switchdev mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,35 @@
mode: 0755
overwrite: true
path: "/usr/local/bin/bind-switchdev.sh"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name looks unclear to me. Since we split the switchdev-configuration.service into two portion. Shall we use the name to switchdev-configuration-before-NM.service and switchdev-configuration-after-NM.service.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pliurh
Copy link
Collaborator

pliurh commented Dec 8, 2021

Switchdev-configuration-before-NM service rebinds VF to its kernel driver and executes after NetworkManager service

In the commit message, shall it be Switchdev-configuration-after-NM instead?

@zshi-redhat
Copy link
Collaborator Author

Switchdev-configuration-before-NM service rebinds VF to its kernel driver and executes after NetworkManager service

In the commit message, shall it be Switchdev-configuration-after-NM instead?

updated

@pliurh
Copy link
Collaborator

pliurh commented Dec 8, 2021

/lgtm

@github-actions github-actions bot added the lgtm label Dec 8, 2021
@zshi-redhat
Copy link
Collaborator Author

/cc @adrianchiris @e0ne

fi

# Required for NetworkManager configuration(e.g. bond) to settle down
sleep 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we dont have this sleep ? do you know what configurations need to settle down ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened was that VF LAG didn't take effect (there is the dmesg indicating the error).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianchiris any follow-up questions ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no additional questions. VF Lag has some quirks :(

@@ -67,12 +67,5 @@ contents:

# turn hw-tc-offload on
/usr/sbin/ethtool -K ${name} hw-tc-offload on

Copy link
Collaborator

@adrianchiris adrianchiris Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we could probably save some time during boot if we set sriov_drivers_autoprobe to 0
then there is no need to unload drivers in line 57

  1. echo 0 > /sys/bus/pci/devices/PCI_DBDF_OF_PF/sriov_drivers_autoprobe
  2. echo XXX > /sys/bus/pci/devices/PCI_DBDF_OF_PF/sriov_numvfs
  3. Restore drivers autoprobe
    echo 1 > /sys/bus/pci/devices/PCI_DBDF_OF_PF/sriov_drivers_autoprobe

if this sounds like something we like, i can open an enhancement on it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Can echo 1 > /sys/bus/pci/devices/PCI_DBDF_OF_PF/sriov_drivers_autoprobe load the driver to VFs as we do in line 75?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to re-enable sriov_drivers_autoprobe before binding driver in L75

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done #214

@zshi-redhat zshi-redhat changed the title Add switchdev-bind service Add switchdev-configuration-after-NM service Dec 14, 2021
Switchdev-configuration-before-NM service rebinds VF to its
kernel driver and executes after NetworkManager service, this
is required for features such as VF LAG to take effect when
bond or other network configurtion are configured through
NetworkManager service.

Signed-off-by: Zenghui Shi <[email protected]>
@zshi-redhat
Copy link
Collaborator Author

resolved merge conflict.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@pliurh pliurh merged commit 5828ce9 into k8snetworkplumbingwg:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants