-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network traffic segregation by Multus -> OSDs Flattering #11642
Comments
I tried to revert network configuration to default (pods network), than OSDs still trying to bind to public and cluster network as on previous state (192.168.249.0/24 and 192.168.250.0/24).
In this case OSDs boot OK, and have UP and IN status, but all PGs become to stuck with peereing or activating status, like #11626 Also I tested connectivity within the same multus networks in rook-ceph NS via iperf3 with good results for 10gbe. Now configuration is reverted back to multus with the same issues (as in my original post). |
This is similar to a bug I've been tracking elsewhere. Unfortunately, right now I don't have great info. Thank you for providing all the details here in the description. It'll help us debug this. As a debugging step. do you have the same issues if you use host networking mode? Could you try that and report back your findings? |
so I need again to specify public network at
permissions granted,
But from inside the particular pod (osd.1)
|
So I guess it is not the problem with multus itself, but I'm wondering where is it stores an "old" configuration, like previously configure public and cluster networks (192.168.249.0/24, 192.168.250.0/24), I've looked up all meta of all manifest except bluestore with no luck. |
Hello,
Unfortunately I did not pin rpm-ostree of latest OKD release (4.12.0-0.okd-2023-02-04-212953 which was not working correctly), and I'm currently not able to manually boot any node with "problematic" FCOS version to identify the root cause: is it FCOS/kernel or some of OKD services are affecting the ceph cluster. |
At the moment all ceph volumes can not be mounted with the error
Documentation/Troubleshooting/ceph-csi-common-issues.md doesn't help |
Reverting to default network configuration solves the problem. But anyway reverting is possible only with modifying
|
Hello, |
hello, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
Is this a bug report or feature request?
Hello dear Rook Team
Adding network traffic splitting to CepnCluster by the Multus provider and rebooting all cluster worker nodes leads to all OSDs flattering.
OSDs claim about heartbeat absence for all peers.
There are a netwrok connectivity within all three netwokrs:
All OSDs can reach each other even by curl like
Network attachment definitions
Clocks on all nodes are synced
cluster.yaml
, if necessaryLogs to submit:
for example logs of `rook-ceph-osd-0-7bd7b967fb-c8hwg' pod
Cluster Status to submit:
Environment:
uname -a
):rook version
inside of a Rook Pod): rook:ceph -v
):kubectl version
):ceph health
in the Rook Ceph toolbox):The text was updated successfully, but these errors were encountered: