Network traffic segregation by Multus -> OSDs Flattering #11642

dsevost · 2023-02-09T15:09:03Z

Is this a bug report or feature request?

Bug Report
Hello dear Rook Team
Adding network traffic splitting to CepnCluster by the Multus provider and rebooting all cluster worker nodes leads to all OSDs flattering.
OSDs claim about heartbeat absence for all peers.
There are a netwrok connectivity within all three netwokrs:

default pods network
public network
cluster network
All OSDs can reach each other even by curl like

sh-4.4# curl 192.168.249.42:6800 2> /dev/null
ceph v2
sh-4.4# curl 192.168.249.44:6800 2> /dev/null
ceph v2

Network attachment definitions

---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: ceph-cluster
  namespace: rook-ceph
spec:
  config: |-
    { 
      "cniVersion": "0.3.1", 
      "type": "macvlan", 
      "master": "br-ex", 
      "mode": "bridge", 
      "ipam": { 
        "type": "whereabouts", 
        "range": "192.168.250.0/24" 
      } 
    }

---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: ceph-public
  namespace: rook-ceph
spec:
  config: |-
    { 
      "cniVersion": "0.3.1", 
      "type": "macvlan", 
      "master": "br-ex", 
      "mode": "bridge", 
      "ipam": { 
        "type": "whereabouts", 
        "range": "192.168.249.0/24" 
      } 
    }

Clocks on all nodes are synced

$ for h in 172.16.15.24{0..2} ; do  ssh -t core@$h sudo -i date ; done
Thu Feb  9 14:52:37 UTC 2023
Connection to 172.16.15.240 closed.
Thu Feb  9 14:52:37 UTC 2023
Connection to 172.16.15.241 closed.
Thu Feb  9 14:52:37 UTC 2023
Connection to 172.16.15.242 closed.

Cluster CR (custom resource), typically called cluster.yaml, if necessary

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  creationTimestamp: '2023-01-22T08:16:33Z'
  name: rook-ceph-cluster
  namespace: rook-ceph
spec:
  mon:
    count: 3
  network:
    provider: multus
    selectors:
      cluster: rook-ceph/ceph-cluster
      public: rook-ceph/ceph-public
  dataDirHostPath: /var/lib/rook
  continueUpgradeAfterChecksEvenIfNotHealthy: true
  dashboard:
    enabled: true
  mgr:
    allowMultiplePerNode: true
    count: 2
    modules:
      - enabled: true
        name: pg_autoscaler
  storage:
    deviceFilter: 'sd[b-d]'
    nodes:
      - devices:
          - name: /dev/disk/by-id/ata-HS-SSD-E100_1024G_30086700245
          - name: /dev/disk/by-id/ata-WDC_WD3200AAKS-75L9A0_WD-WCAV25883507
        name: worker1-rs1
        resources: {}
      - devices:
          - name: /dev/disk/by-id/ata-HS-SSD-E100_1024G_30086700239
          - name: /dev/disk/by-id/ata-WDC_WD3200AAKS-75L9A0_WD-WCAV26368312
        name: worker2-rs1
        resources: {}
      - devices:
          - name: /dev/disk/by-id/ata-HS-SSD-E100_1024G_30086700233
          - name: /dev/disk/by-id/ata-WDC_WD3200AAKS-75L9A0_WD-WCAV26373629
        name: worker3-rs1
        resources: {}
    useAllDevices: false
    useAllNodes: true
  cephVersion:
    image: 'quay.io/ceph/ceph:v17.2.5'
status:
  ceph:
    capacity:
      bytesAvailable: 3625835642880
      bytesTotal: 4032847429632
      bytesUsed: 407011786752
      lastUpdated: '2023-02-09T14:32:07Z'
    details:
      MDS_SLOW_METADATA_IO:
        message: 2 MDSs report slow metadata IOs
        severity: HEALTH_WARN
      OSD_DOWN:
        message: 5 osds down
        severity: HEALTH_WARN
      OSD_FLAGS:
        message: >-
          1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT}
          flags set
        severity: HEALTH_WARN
      OSD_HOST_DOWN:
        message: 2 hosts (4 osds) down
        severity: HEALTH_WARN
      PG_AVAILABILITY:
        message: 'Reduced data availability: 61 pgs inactive, 4 pgs down'
        severity: HEALTH_WARN
      PG_DEGRADED:
        message: >-
          Degraded data redundancy: 846/106014 objects degraded (0.798%), 37 pgs
          degraded, 57 pgs undersized
        severity: HEALTH_WARN
      SLOW_OPS:
        message: '2 slow ops, oldest one blocked for 460 sec, mon.c has slow ops'
        severity: HEALTH_WARN
    fsid: 59e8bdb5-7a55-4fcc-946f-6fb34ad23076
    health: HEALTH_WARN
    lastChanged: '2023-02-09T13:50:59Z'
    lastChecked: '2023-02-09T14:32:07Z'
    previousHealth: HEALTH_ERR
    versions:
      mds:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 2
      mgr:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 2
      mon:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 3
      osd:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 1
      overall:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 8
  conditions:
    - lastHeartbeatTime: '2023-02-09T14:32:07Z'
      lastTransitionTime: '2023-02-09T11:20:18Z'
      message: Cluster created successfully
      reason: ClusterCreated
      status: 'True'
      type: Ready
  message: Cluster created successfully
  observedGeneration: 21
  phase: Ready
  state: Created
  storage:
    deviceClasses:
      - name: hdd
      - name: ssd
  version:
    image: 'quay.io/ceph/ceph:v17.2.5'
    version: 17.2.5-0

Logs to submit:

Operator's logs, if necessary

[...]
2023-02-09 14:36:04.169933 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:undersized+degraded+peered Count:37} {StateName:undersized+peered Count:20} {StateName:stale+down Count:3} {StateName:down Count:1}]"
I0209 14:36:13.338837       1 controller.go:217]  "msg"="reconciling claim" "key"="ds-test-01/ceph-bucket"
I0209 14:36:13.338859       1 helpers.go:107]  "msg"="getting claim for key" "key"="ds-test-01/ceph-bucket"
I0209 14:36:13.341336       1 helpers.go:213]  "msg"="getting ObjectBucketClaim's StorageClass" "key"="ds-test-01/ceph-bucket"
I0209 14:36:13.343230       1 helpers.go:218]  "msg"="got StorageClass" "key"="ds-test-01/ceph-bucket" "name"="rook-ceph-bucket"
I0209 14:36:13.343242       1 controller.go:270]  "msg"="syncing obc creation" "key"="ds-test-01/ceph-bucket"
I0209 14:36:13.343255       1 controller.go:552]  "msg"="updating OBC metadata" "key"="ds-test-01/ceph-bucket"
I0209 14:36:13.343264       1 resourcehandlers.go:277]  "msg"="updating" "key"="ds-test-01/ceph-bucket" "obc"="ds-test-01/ceph-bucket"
I0209 14:36:13.349831       1 controller.go:341]  "msg"="provisioning" "bucket"="ceph-bkt-test-01-f547f139-d1cc-440e-a358-108fa931ee7b" "key"="ds-test-01/ceph-bucket"
2023-02-09 14:36:13.349838 I | op-bucket-prov: initializing and setting CreateOrGrant services
2023-02-09 14:36:13.349845 I | op-bucket-prov: getting storage class "rook-ceph-bucket"
E0209 14:36:28.434438       1 controller.go:204] error syncing 'ds-test-01/ceph-bucket': error provisioning bucket: failed to set admin ops api client: failed to retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "object-store": failed to create s3 user. . : command terminated with exit code 124, requeuing
I0209 14:36:28.444625       1 controller.go:217]  "msg"="reconciling claim" "key"="openshift-image-registry/registry-s3"
I0209 14:36:28.444665       1 helpers.go:107]  "msg"="getting claim for key" "key"="openshift-image-registry/registry-s3"
I0209 14:36:28.447779       1 helpers.go:213]  "msg"="getting ObjectBucketClaim's StorageClass" "key"="openshift-image-registry/registry-s3"
I0209 14:36:28.449823       1 helpers.go:218]  "msg"="got StorageClass" "key"="openshift-image-registry/registry-s3" "name"="rook-ceph-bucket"
I0209 14:36:28.449852       1 controller.go:270]  "msg"="syncing obc creation" "key"="openshift-image-registry/registry-s3"
I0209 14:36:28.449876       1 controller.go:552]  "msg"="updating OBC metadata" "key"="openshift-image-registry/registry-s3"
I0209 14:36:28.449896       1 resourcehandlers.go:277]  "msg"="updating" "key"="openshift-image-registry/registry-s3" "obc"="openshift-image-registry/registry-s3"
I0209 14:36:28.456350       1 controller.go:341]  "msg"="provisioning" "bucket"="registry-bucket-07ff14b3-feea-4493-8efa-c904a98121c2" "key"="openshift-image-registry/registry-s3"
2023-02-09 14:36:28.456367 I | op-bucket-prov: initializing and setting CreateOrGrant services
2023-02-09 14:36:28.456397 I | op-bucket-prov: getting storage class "rook-ceph-bucket"
2023-02-09 14:36:32.857055 I | ceph-spec: parsing mon endpoints: f=100.65.168.113:6789,c=100.65.91.38:6789,h=100.65.146.189:6789
2023-02-09 14:36:34.170886 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:undersized+degraded+peered Count:37} {StateName:undersized+peered Count:20} {StateName:stale+down Count:3} {StateName:down Count:1}]"
2023-02-09 14:36:34.924157 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:undersized+degraded+peered Count:37} {StateName:undersized+peered Count:20} {StateName:stale+down Count:3} {StateName:down Count:1}]"
E0209 14:36:43.555359       1 controller.go:204] error syncing 'openshift-image-registry/registry-s3': error provisioning bucket: failed to set admin ops api client: failed to retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "object-store": failed to create s3 user. . : command terminated with exit code 124, requeuing
I0209 14:36:43.555384       1 controller.go:217]  "msg"="reconciling claim" "key"="quay/registry-quay-datastore"
I0209 14:36:43.555393       1 helpers.go:107]  "msg"="getting claim for key" "key"="quay/registry-quay-datastore"
I0209 14:36:43.589704       1 helpers.go:213]  "msg"="getting ObjectBucketClaim's StorageClass" "key"="quay/registry-quay-datastore"
I0209 14:36:43.606164       1 helpers.go:218]  "msg"="got StorageClass" "key"="quay/registry-quay-datastore" "name"="rook-ceph-bucket"
I0209 14:36:43.606193       1 controller.go:270]  "msg"="syncing obc creation" "key"="quay/registry-quay-datastore"
I0209 14:36:43.606225       1 controller.go:552]  "msg"="updating OBC metadata" "key"="quay/registry-quay-datastore"
I0209 14:36:43.606233       1 resourcehandlers.go:277]  "msg"="updating" "key"="quay/registry-quay-datastore" "obc"="quay/registry-quay-datastore"
E0209 14:36:43.671756       1 controller.go:204] error syncing 'quay/registry-quay-datastore': obc "registry-quay-datastore" bucketName has changed compared to ob "obc-quay-registry-quay-datastore", requeuing
2023-02-09 14:36:47.927720 E | ceph-bucket-notification: failed to reconcile failed to list bucket notifications in ObjectbucketClaim "quay/obc-quay-registry-quay-datastore": failed to create S3 agent for CephBucketNotification provisioning for bucket "quay-datastore-10ceedda-82ce-43cb-a103-f46fa2d702bc": failed to get admin Ops context for CephObjectStore "rook-ceph/object-store": failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "object-store": failed to create s3 user. . : command terminated with exit code 124
2023-02-09 14:36:48.465812 I | ceph-spec: parsing mon endpoints: f=100.65.168.113:6789,c=100.65.91.38:6789,h=100.65.146.189:6789
2023-02-09 14:37:03.545798 E | ceph-bucket-notification: failed to reconcile failed to list bucket notifications in ObjectbucketClaim "ds-test-01/obc-ds-test-01-ceph-bucket": failed to create S3 agent for CephBucketNotification provisioning for bucket "ceph-bkt-test-01-f547f139-d1cc-440e-a358-108fa931ee7b": failed to get admin Ops context for CephObjectStore "rook-ceph/object-store": failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "object-store": failed to create s3 user. . : command terminated with exit code 124
2023-02-09 14:37:03.551326 I | ceph-spec: parsing mon endpoints: f=100.65.168.113:6789,c=100.65.91.38:6789,h=100.65.146.189:6789
2023-02-09 14:37:04.923362 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+undersized+degraded Count:24} {StateName:active+undersized Count:19} {StateName:active+recovering+undersized+degraded Count:11} {StateName:stale+down Count:3} {StateName:active+recovering+undersized+degraded+remapped Count:2} {StateName:undersized+peered Count:1} {StateName:down Count:1}]"
2023-02-09 14:37:05.686064 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+undersized+degraded Count:24} {StateName:active+undersized Count:19} {StateName:active+recovering+undersized+degraded Count:11} {StateName:stale+down Count:3} {StateName:active+recovering+undersized+degraded+remapped Count:2} {StateName:undersized+peered Count:1} {StateName:down Count:1}]"
2023-02-09 14:37:18.633729 E | ceph-bucket-notification: failed to reconcile failed to list bucket notifications in ObjectbucketClaim "openshift-image-registry/obc-openshift-image-registry-registry-s3": failed to create S3 agent for CephBucketNotification provisioning for bucket "registry-bucket-07ff14b3-feea-4493-8efa-c904a98121c2": failed to get admin Ops context for CephObjectStore "rook-ceph/object-store": failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "object-store": failed to create s3 user. . : command terminated with exit code 124
2023-02-09 14:37:35.685774 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+undersized+degraded Count:29} {StateName:active+undersized Count:19} {StateName:active+recovering+undersized+degraded Count:8} {StateName:stale+down Count:3} {StateName:undersized+peered Count:1} {StateName:down Count:1}]"
2023-02-09 14:37:36.442090 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+undersized+degraded Count:29} {StateName:active+undersized Count:19} {StateName:active+recovering+undersized+degraded Count:8} {StateName:stale+down Count:3} {StateName:undersized+peered Count:1} {StateName:down Count:1}]"
2023-02-09 14:37:38.878085 I | ceph-spec: parsing mon endpoints: f=100.65.168.113:6789,c=100.65.91.38:6789,h=100.65.146.189:6789
2023-02-09 14:37:39.211628 I | ceph-block-pool-controller: creating pool "r3-ssd" in namespace "rook-ceph"
2023-02-09 14:37:40.235786 I | cephclient: application "rbd" is already set on pool "r3-ssd"
2023-02-09 14:37:40.235800 I | cephclient: reconciling replicated pool r3-ssd succeeded
2023-02-09 14:37:40.908316 I | ceph-block-pool-controller: initializing pool "r3-ssd" for RBD use
2023-02-09 14:37:55.990871 E | ceph-block-pool-controller: failed to reconcile CephBlockPool "rook-ceph/r3-ssd". failed to create pool "r3-ssd".: failed to create pool "r3-ssd".: failed to initialize pool "r3-ssd" for RBD use. : command terminated with exit code 124
2023-02-09 14:38:06.441614 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+undersized+degraded Count:29} {StateName:active+undersized Count:19} {StateName:active+recovering+undersized+degraded Count:8} {StateName:stale+down Count:3} {StateName:undersized+peered Count:1} {StateName:down Count:1}]"
2023-02-09 14:38:07.208077 I | clusterdisruption-controller: all "host" failure domains: [worker1-rs1 worker2-rs1 worker3-rs1]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+undersized+degraded 
[...]

Crashing pod(s) logs, if necessary
for example logs of `rook-ceph-osd-0-7bd7b967fb-c8hwg' pod

[...]
debug 2023-02-09T14:34:50.169+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.44:6800 osd.2 ever on either front or back, first ping sent 2023-02-09T14:25:07.212716+0000 (oldest deadline 2023-02-09T14:25:27.212716+0000)
debug 2023-02-09T14:34:51.181+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.44:6800 osd.2 ever on either front or back, first ping sent 2023-02-09T14:25:07.212716+0000 (oldest deadline 2023-02-09T14:25:27.212716+0000)
debug 2023-02-09T14:34:52.227+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.44:6800 osd.2 ever on either front or back, first ping sent 2023-02-09T14:25:07.212716+0000 (oldest deadline 2023-02-09T14:25:27.212716+0000)
debug 2023-02-09T14:34:53.257+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.44:6800 osd.2 ever on either front or back, first ping sent 2023-02-09T14:25:07.212716+0000 (oldest deadline 2023-02-09T14:25:27.212716+0000)
debug 2023-02-09T14:34:54.230+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.42:6800 osd.1 ever on either front or back, first ping sent 2023-02-09T14:34:33.645826+0000 (oldest deadline 2023-02-09T14:34:53.645826+0000)
debug 2023-02-09T14:34:54.230+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.44:6800 osd.2 ever on either front or back, first ping sent 2023-02-09T14:25:07.212716+0000 (oldest deadline 2023-02-09T14:25:27.212716+0000)
debug 2023-02-09T14:34:55.277+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.42:6800 osd.1 ever on either front or back, first ping sent 2023-02-09T14:34:33.645826+0000 (oldest deadline 2023-02-09T14:34:53.645826+0000)
debug 2023-02-09T14:34:55.277+0000 7f9598d27700 -1 osd.0 6629 heartbeat_check: no reply from 192.168.249.44:6800 osd.2 ever on either front or back, first ping sent 2023-02-09T14:25:07.212716+0000 (oldest deadline 2023-02-09T14:25:27.212716+0000)
debug 2023-02-09T14:34:55.307+0000 7f958ed13700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.0 down, but it is still running
debug 2023-02-09T14:34:55.307+0000 7f958ed13700 0 log_channel(cluster) log [DBG] : map e6643 wrongly marked me down at e6630
debug 2023-02-09T14:34:55.307+0000 7f958ed13700 1 osd.0 6643 start_waiting_for_healthy
debug 2023-02-09T14:34:55.308+0000 7f9585500700 1 osd.0 pg_epoch: 6630 pg[5.0( v 270'6 (0'0,270'6] local-lis/les=6627/6628 n=2 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 luod=0'0 crt=270'6 lcod 0'0 mlcod 0'0 active mbc={}] start_peering_interval up [1,0] -> [1], acting [1,0] -> [1], acting_primary 1 -> 1, up_primary 1 -> 1, role 1 -> -1, features acting 4540138320759226367 upacting 4540138320759226367
debug 2023-02-09T14:34:55.308+0000 7f958ed13700 1 osd.0 6643 is_healthy false -- only 0/1 up peers (less than 33%)
debug 2023-02-09T14:34:55.308+0000 7f9584cff700 1 osd.0 pg_epoch: 6630 pg[6.1( empty local-lis/les=6627/6628 n=0 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=0'0 mlcod 0'0 active mbc={}] start_peering_interval up [0,1] -> [1], acting [0,1] -> [1], acting_primary 0 -> 1, up_primary 0 -> 1, role 0 -> -1, features acting 4540138320759226367 upacting 4540138320759226367
debug 2023-02-09T14:34:55.308+0000 7f958ed13700 1 osd.0 6643 not healthy; waiting to boot
debug 2023-02-09T14:34:55.308+0000 7f9583cfd700 1 osd.0 pg_epoch: 6630 pg[6.3( empty local-lis/les=6627/6628 n=0 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=0'0 mlcod 0'0 active mbc={}] start_peering_interval up [0,1] -> [1], acting [0,1] -> [1], acting_primary 0 -> 1, up_primary 0 -> 1, role 0 -> -1, features acting 4540138320759226367 upacting 4540138320759226367
debug 2023-02-09T14:34:55.308+0000 7f95844fe700 1 osd.0 pg_epoch: 6630 pg[5.7( v 336'4 (0'0,336'4] local-lis/les=6627/6628 n=2 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=336'4 lcod 0'0 mlcod 0'0 active mbc={}] start_peering_interval up [0,1] -> [1], acting [0,1] -> [1], acting_primary 0 -> 1, up_primary 0 -> 1, role 0 -> -1, features acting 4540138320759226367 upacting 4540138320759226367
debug 2023-02-09T14:34:55.308+0000 7f95834fc700 1 osd.0 pg_epoch: 6630 pg[5.4( empty local-lis/les=6627/6628 n=0 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=0'0 mlcod 0'0 active mbc={}] start_peering_interval up [1,0] -> [1], acting [1,0] -> [1], acting_primary 1 -> 1, up_primary 1 -> 1, role 1 -> -1, features acting 4540138320759226367 upacting 4540138320759226367
debug 2023-02-09T14:34:55.308+0000 7f9585500700 1 osd.0 pg_epoch: 6643 pg[5.0( v 270'6 (0'0,270'6] local-lis/les=6627/6628 n=2 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=270'6 lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
debug 2023-02-09T14:34:55.308+0000 7f9583cfd700 1 osd.0 pg_epoch: 6643 pg[6.3( empty local-lis/les=6627/6628 n=0 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
debug 2023-02-09T14:34:55.308+0000 7f9584cff700 1 osd.0 pg_epoch: 6643 pg[6.1( empty local-lis/les=6627/6628 n=0 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
debug 2023-02-09T14:34:55.308+0000 7f95844fe700 1 osd.0 pg_epoch: 6643 pg[5.7( v 336'4 (0'0,336'4] local-lis/les=6627/6628 n=2 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=336'4 lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
debug 2023-02-09T14:34:55.308+0000 7f95834fc700 1 osd.0 pg_epoch: 6643 pg[5.4( empty local-lis/les=6627/6628 n=0 ec=36/36 lis/c=6627/6330 les/c/f=6628/6331/0 sis=6630) [1] r=-1 lpr=6630 pi=[6330,6630)/3 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
[...]
debug 2023-02-09T14:34:55.310+0000 7f9584cff700 1 osd.0 pg_epoch: 6643 pg[2.1( v 6393'102996 (2949'97785,6393'102996] local-lis/les=6627/6628 n=50 ec=35/35 lis/c=6627/6274 les/c/f=6628/6275/0 sis=6630) [1] r=-1 lpr=6630 pi=[6274,6630)/3 crt=6393'102996 lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
debug 2023-02-09T14:34:56.042+0000 7f959bd2d700 1 osd.0 6644 is_healthy false -- only 0/1 up peers (less than 33%)
debug 2023-02-09T14:34:56.042+0000 7f959bd2d700 1 osd.0 6644 not healthy; waiting to boot
debug 2023-02-09T14:34:56.042+0000 7f959bd2d700 1 osd.0 6644 tick checking mon for new map
debug 2023-02-09T14:34:57.032+0000 7f959bd2d700 1 osd.0 6644 is_healthy false -- only 0/1 up peers (less than 33%)
debug 2023-02-09T14:34:57.032+0000 7f959bd2d700 1 osd.0 6644 not healthy; waiting to boot
debug 2023-02-09T14:34:58.007+0000 7f959bd2d700 1 osd.0 6644 is_healthy false -- only 0/1 up peers (less than 33%)
debug 2023-02-09T14:34:58.007+0000 7f959bd2d700 1 osd.0 6644 not healthy; waiting to boot
debug 2023-02-09T14:34:59.024+0000 7f959bd2d700 1 osd.0 6644 is_healthy false -- only 0/1 up peers (less than 33%)
debug 2023-02-09T14:34:59.024+0000 7f959bd2d700 1 osd.0 6644 not healthy; waiting to boot
debug 2023-02-09T14:34:59.999+0000 7f959bd2d700 1 osd.0 6644 is_healthy false -- only 0/1 up peers (less than 33%)
[...]

Cluster Status to submit:

cluster:
    id:     59e8bdb5-7a55-4fcc-946f-6fb34ad23076
    health: HEALTH_WARN
            2 MDSs report slow metadata IOs
            2 osds down
            Reduced data availability: 5 pgs inactive, 4 pgs down, 3 pgs stale
            Degraded data redundancy: 424/106014 objects degraded (0.400%), 37 pgs degraded, 57 pgs undersized
            2 slow ops, oldest one blocked for 221 sec, mon.c has slow ops
 
  services:
    mon: 3 daemons, quorum c,f,h (age 41m)
    mgr: a(active, since 40m), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 2 up (since 4m), 4 in (since 97s); 1 remapped pgs
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 61 pgs
    objects: 35.34k objects, 129 GiB
    usage:   253 GiB used, 2.2 TiB / 2.4 TiB avail
    pgs:     8.197% pgs not active
             424/106014 objects degraded (0.400%)
             35 active+undersized+degraded
             19 active+undersized
             3  stale+down
             2  active+recovering+undersized+degraded
             1  undersized+peered
             1  down
 
  io:
    recovery: 0 B/s, 0 objects/s

Environment:

OS (e.g. from /etc/os-release):

NAME="Fedora Linux"
VERSION="37.20230110.3.1 (CoreOS)"
ID=fedora
VERSION_ID=37
VERSION_CODENAME=""
PLATFORM_ID="platform:f37"
PRETTY_NAME="Fedora CoreOS 37.20230110.3.1"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:37"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=37
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=37
SUPPORT_END=2023-11-14
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='37.20230110.3.1'

Kernel (e.g. uname -a):

Linux worker1-rs1 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cloud provider or hardware configuration:

 Baremetal

Rook version (use rook version inside of a Rook Pod): rook:

v1.10.10
go: go1.19.4

Storage backend version (e.g. for ceph do ceph -v):

ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)

Kubernetes version (use kubectl version):

Client Version: v1.24.1
Kustomize Version: v4.5.7
Server Version: v1.25.0-2653+a34b9e9499e6c3-dirty

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):

OKD 4.12.0-0.okd-2023-02-04-212953

Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

HEALTH_WARN 2 MDSs report slow metadata IOs; 5 osds down; 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set; 2 hosts (4 osds) down; Reduced data availability: 61 pgs inactive, 4 pgs down; Degraded data redundancy: 846/106014 objects degraded (0.798%), 37 pgs degraded, 57 pgs undersized; 2 slow ops, oldest one blocked for 340 sec, mon.c has slow ops

The text was updated successfully, but these errors were encountered:

dsevost · 2023-02-09T16:14:07Z

I tried to revert network configuration to default (pods network), than OSDs still trying to bind to public and cluster network as on previous state (192.168.249.0/24 and 192.168.250.0/24).
So I need to manulally add to CM rook-config-override

[global]
public network interface = "eth0"
cluster network interface = "eth0"

In this case OSDs boot OK, and have UP and IN status, but all PGs become to stuck with peereing or activating status, like #11626

Also I tested connectivity within the same multus networks in rook-ceph NS via iperf3 with good results for 10gbe.

Now configuration is reverted back to multus with the same issues (as in my original post).

BlaineEXE · 2023-02-09T18:12:38Z

This is similar to a bug I've been tracking elsewhere. Unfortunately, right now I don't have great info. Thank you for providing all the details here in the description. It'll help us debug this.

As a debugging step. do you have the same issues if you use host networking mode? Could you try that and report back your findings?

dsevost · 2023-02-09T19:22:58Z

`hostnetwork: true' requires privileges (OK, permitted)
MGRs failed to start with error:

debug 2023-02-09T19:01:23.534+0000 7ff8f4ede000 0 set uid:gid to 167:167 (ceph:ceph)
debug 2023-02-09T19:01:23.534+0000 7ff8f4ede000 0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process ceph-mgr, pid 125
debug 2023-02-09T19:01:23.534+0000 7ff8f4ede000 -1 unable to find any IP address in networks '192.168.249.0/24' interfaces ''

so I need again to specify public network at rook-config-override manually (success)
3. Similar issues with `rook-ceph-osd' and cluster network

debug 2023-02-09T19:09:47.534+0000 7f0b4e9533c0  0 set uid:gid to 167:167 (ceph:ceph)
debug 2023-02-09T19:09:47.534+0000 7f0b4e9533c0  0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process ceph-osd, pid 539
debug 2023-02-09T19:09:47.534+0000 7f0b4e9533c0  0 pidfile_write: ignore empty --pid-file
debug 2023-02-09T19:09:47.538+0000 7f0b4e9533c0  1 bdev(0x55663770ec00 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
debug 2023-02-09T19:09:47.538+0000 7f0b4e9533c0  0 bdev(0x55663770ec00 /var/lib/ceph/osd/ceph-0/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-0/block failed: (22) Invalid argument
debug 2023-02-09T19:09:47.538+0000 7f0b4e9533c0  1 bdev(0x55663770ec00 /var/lib/ceph/osd/ceph-0/block) open size 320072933376 (0x4a85d56000, 298 GiB) block_size 4096 (4 KiB) rotational discard not supported
debug 2023-02-09T19:09:47.539+0000 7f0b4e9533c0  1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
debug 2023-02-09T19:09:47.539+0000 7f0b4e9533c0  1 bdev(0x55663770f400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
debug 2023-02-09T19:09:47.539+0000 7f0b4e9533c0  0 bdev(0x55663770f400 /var/lib/ceph/osd/ceph-0/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-0/block failed: (22) Invalid argument
debug 2023-02-09T19:09:47.539+0000 7f0b4e9533c0  1 bdev(0x55663770f400 /var/lib/ceph/osd/ceph-0/block) open size 320072933376 (0x4a85d56000, 298 GiB) block_size 4096 (4 KiB) rotational discard not supported
debug 2023-02-09T19:09:47.539+0000 7f0b4e9533c0  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block size 298 GiB
debug 2023-02-09T19:09:47.539+0000 7f0b4e9533c0  1 bdev(0x55663770f400 /var/lib/ceph/osd/ceph-0/block) close
debug 2023-02-09T19:09:47.804+0000 7f0b4e9533c0  1 bdev(0x55663770ec00 /var/lib/ceph/osd/ceph-0/block) close
debug 2023-02-09T19:09:48.062+0000 7f0b4e9533c0  0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
debug 2023-02-09T19:09:48.063+0000 7f0b4e9533c0 -1 unable to find any IPv4 address in networks '192.168.250.0/24' interfaces ''
debug 2023-02-09T19:09:48.063+0000 7f0b4e9533c0 -1 Failed to pick cluster address.

permissions granted, rook-config-override fixed
4. I did not fix MDS since i've got OSDs heartbeat claims, like

debug 2023-02-09T19:19:11.015+0000 7fb39617a700 -1 osd.1 7159 heartbeat_check: no reply from 172.16.15.241:6800 osd.5 ever on either front or back, first ping sent 2023-02-09T19:17:02.090638+0000 (oldest deadline 2023-02-09T19:17:22.090638+0000)
debug 2023-02-09T19:19:12.052+0000 7fb39617a700 -1 osd.1 7159 heartbeat_check: no reply from 172.16.15.240:6800 osd.4 ever on either front or back, first ping sent 2023-02-09T19:16:07.885673+0000 (oldest deadline 2023-02-09T19:16:27.885673+0000)
debug 2023-02-09T19:19:12.052+0000 7fb39617a700 -1 osd.1 7159 heartbeat_check: no reply from 172.16.15.241:6800 osd.5 ever on either front or back, first ping sent 2023-02-09T19:17:02.090638+0000 (oldest deadline 2023-02-09T19:17:22.090638+0000)

But from inside the particular pod (osd.1)

h-4.4# curl 172.16.15.241:6800 2> /dev/null
ceph v2
sh-4.4# curl 172.16.15.240:6800 2> /dev/null
ceph v2
sh-4.4# curl 172.16.15.242:6800 2> /dev/null
ceph v2

Finally rook-config-override looks like

    [global]
    #public network = ""
    #cluster network = ""
    #public network interface = "eth0"
    #cluster network interface = "eth0"
    public network = 172.16.15.0/24
    cluster network = 172.16.15.0/24
    #

dsevost · 2023-02-09T20:16:09Z

So I guess it is not the problem with multus itself, but I'm wondering where is it stores an "old" configuration, like previously configure public and cluster networks (192.168.249.0/24, 192.168.250.0/24), I've looked up all meta of all manifest except bluestore with no luck.

dsevost · 2023-02-10T20:15:13Z

Hello,
I was digging this problem all day, and got solution.
The prolem was with the OKD update. I downgrade the OKD to version 4.12.0-0.okd-2023-01-21-055900 and my ceph cluster is now up and runing

sh-4.4$ ceph -s
  cluster:
    id:     59e8bdb5-7a55-4fcc-946f-6fb34ad23076
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum c,f,h (age 17m)
    mgr: a(active, since 16m), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 7m), 6 in (since 43m)
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 61 pgs
    objects: 35.35k objects, 129 GiB
    usage:   379 GiB used, 3.3 TiB / 3.7 TiB avail
    pgs:     61 active+clean
 
  io:
    client:   11 KiB/s rd, 170 B/s wr, 8 op/s rd, 0 op/s wr

Unfortunately I did not pin rpm-ostree of latest OKD release (4.12.0-0.okd-2023-02-04-212953 which was not working correctly), and I'm currently not able to manually boot any node with "problematic" FCOS version to identify the root cause: is it FCOS/kernel or some of OKD services are affecting the ceph cluster.
Hope this helps you to investigate the problem (jointly with ODK and Ceph teams)

dsevost · 2023-02-10T21:12:41Z

At the moment all ceph volumes can not be mounted with the error

MountVolume.MountDevice failed for volume "pvc-5f516295-b4fe-4f9d-86d6-9939cd7fb69f" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-000000000000000a-c4641e98-9fde-11ed-8537-0a5864430343 already exists

Documentation/Troubleshooting/ceph-csi-common-issues.md doesn't help

dsevost · 2023-02-10T21:55:20Z

Reverting to default network configuration solves the problem.
It might be the issue with Service for default and second network configuration like described at How to Use Kubernetes Services on Secondary Networks with Multus CNI

But anyway reverting is possible only with modifying rook-config-override like

    [global]
    public network = ""
    cluster network = ""

dsevost · 2023-02-11T10:52:18Z

Hello,
cephfs-pluging daemonset doesn't have hostPID: true and therefore lack access to /proc/{PID}/ns/net
I checked upstream and there is a difference between rbd-plugin and cephfs-plugin daemonset templates regarding PIDs namespace
also,
rook-csi-cephfs-plugin-sa SA is not associated with role role.rbac.authorization.k8s.io/rbd-csi-nodeplugin

dsevost · 2023-03-15T10:57:35Z

hello,
an update of OKD (https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.12.0-0.okd-2023-03-05-022504) seems to resolve the issue, but the problem regarding #11642 (comment) still exists (I updated rook to v1.11.1)

github-actions · 2023-05-14T20:02:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2023-05-22T20:02:07Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

dsevost added the bug label Feb 9, 2023

travisn assigned BlaineEXE Feb 9, 2023

SriRamanujam mentioned this issue Feb 12, 2023

Rook + Ceph clusters do not work on OKD releases 4.12.0-0.okd-2023-02-04-212953 and 4.11.0-0.okd-2023-01-14-152430 okd-project/okd#1505

Closed

github-actions bot added the wontfix label May 14, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network traffic segregation by Multus -> OSDs Flattering #11642

Network traffic segregation by Multus -> OSDs Flattering #11642

dsevost commented Feb 9, 2023 •

edited

Loading

dsevost commented Feb 9, 2023

BlaineEXE commented Feb 9, 2023

dsevost commented Feb 9, 2023 •

edited

Loading

dsevost commented Feb 9, 2023

dsevost commented Feb 10, 2023 •

edited

Loading

dsevost commented Feb 10, 2023 •

edited

Loading

dsevost commented Feb 10, 2023 •

edited

Loading

dsevost commented Feb 11, 2023 •

edited

Loading

dsevost commented Mar 15, 2023

github-actions bot commented May 14, 2023

github-actions bot commented May 22, 2023

Network traffic segregation by Multus -> OSDs Flattering #11642

Network traffic segregation by Multus -> OSDs Flattering #11642

Comments

dsevost commented Feb 9, 2023 • edited Loading

dsevost commented Feb 9, 2023

BlaineEXE commented Feb 9, 2023

dsevost commented Feb 9, 2023 • edited Loading

dsevost commented Feb 9, 2023

dsevost commented Feb 10, 2023 • edited Loading

dsevost commented Feb 10, 2023 • edited Loading

dsevost commented Feb 10, 2023 • edited Loading

dsevost commented Feb 11, 2023 • edited Loading

dsevost commented Mar 15, 2023

github-actions bot commented May 14, 2023

github-actions bot commented May 22, 2023

dsevost commented Feb 9, 2023 •

edited

Loading

dsevost commented Feb 9, 2023 •

edited

Loading

dsevost commented Feb 10, 2023 •

edited

Loading

dsevost commented Feb 10, 2023 •

edited

Loading

dsevost commented Feb 10, 2023 •

edited

Loading

dsevost commented Feb 11, 2023 •

edited

Loading