-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix OVS "flow" replay for groups #2134
Fix OVS "flow" replay for groups #2134
Conversation
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
Codecov Report
@@ Coverage Diff @@
## main #2134 +/- ##
==========================================
- Coverage 61.22% 61.22% -0.01%
==========================================
Files 269 269
Lines 20453 20457 +4
==========================================
+ Hits 12523 12525 +2
- Misses 6633 6636 +3
+ Partials 1297 1296 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix! LGTM, just have a question about the comment.
pkg/ovs/openflow/ofctrl_group.go
Outdated
func (g *ofGroup) Reset() { | ||
g.ofctrl.Switch = g.bridge.ofSwitch | ||
// An error ("group already exists") is not possible here since the same | ||
// group was created successfully before. If something is wrong and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment correct? It seems because the ofSwitch is a new instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify. I wanted to communicate 2 things 1) it is a new ofSwitch as you mention, but also 2) all the groups we are creating were created successfully before (previous ofSwitch instance) so their creation should succeed this time as well (no duplicate group IDs)
I'm thinking whether there is an issue if only antrea-agent restarts. I know the code has taken care of cleaning up all stale flows, but not groups. If it reuses group IDs, will there be a trouble when installing them? |
When only the agent restarts (as opposed to OVS daemons), there is no cache replay. Instead AntreaProxy will trigger all needed groups to be re-created by calling this function: If the Group already exists, we will get the existing object with Do you think I am missing something, or was your question about a different scenario? |
I mean the same scenario, but both |
@tnqn thanks for clarifying, sorry I'm a bit tired :) It seems that ofnet takes care of deleting all groups during initialization, so an Antrea Agent restart will always clear all groups first. This may not be what we want to do for the long term, but in the short term it guarantees that there won't be any issue during reconciliation on restart. Let me add a comment somewhere in the Antrea code about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
/test-windows-e2e |
/test-all |
/test-e2e |
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes #2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes #2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes #2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes antrea-io#2127
The Group objects were not reset correctly when attempting to replay them, leading to confusing error log messages and invalid datapath state. We fix the implementation of Reset() for groups and we ensure that the method is called during replay. We also update the TestOVSFlowReplay e2e test to make sure it is more comprehensive: instead of just checking Pod-to-Pod connectivity after a replay, we ensure that the number of OVS flows / groups is the same before and after a restart / replay. We confirmed that the updated test fails when the patch is not applied. Fixes #2127
The Group objects were not reset correctly when attempting to replay
them, leading to confusing error log messages and invalid datapath
state. We fix the implementation of Reset() for groups and we ensure
that the method is called during replay.
We also update the TestOVSFlowReplay e2e test to make sure it is more
comprehensive: instead of just checking Pod-to-Pod connectivity after a
replay, we ensure that the number of OVS flows / groups is the same
before and after a restart / replay. We confirmed that the updated test
fails when the patch is not applied.
Fixes #2127