Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promotion of secondaries delayed/not happening #172

Open
Rid opened this issue Aug 19, 2022 · 22 comments
Open

Promotion of secondaries delayed/not happening #172

Rid opened this issue Aug 19, 2022 · 22 comments

Comments

@Rid
Copy link

Rid commented Aug 19, 2022

We are testing a system with 2 worker nodes and 1 further node acting as a tie-breaker.

This issue is twofold:

  • If many pvcs are created at once (in this example we're creating 6 at once) many of the pvcs end up with all nodes being secondary, it can then take up to an hour for any nodes to be promoted to primary, during this time the pods depending on the pvcs are stuck with "ContainerCreating" status.
  • If we experience a node outage where the primary fails, it can take up to 1 hour for all pvcs to switch from secondary, any pods which have been rescheduled are stuck in "ContainerCreating" during this time.

According to https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-automatic-promotion automatic promotion is now baked into drbd, however https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-feature-failover-clusters says this should be done with

a cluster resource manager (CRM) such as DRBD Reactor and its promoter plug-in

and further https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-linstor_ha says that for HA auto-promote should be disabled.

I can see drbd-reactor is running in the drbd-prometheus-exporter container on the satellites, but it doesn't seem to be configured with the promoter plug-in.

The resources are defined as follows:

╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Key                               ┊ Value                                                        ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ Aux/csi-provisioning-completed-by ┊ linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d ┊
┊ DrbdOptions/Resource/on-no-quorum ┊ suspend-io                                                   ┊
┊ DrbdOptions/Resource/quorum       ┊ majority                                                     ┊
┊ DrbdOptions/auto-verify-alg       ┊ crct10dif-pclmul                                             ┊
┊ DrbdPrimarySetOn                  ┊ DEDI1-NODE1.23-106-60-155.LON-01.UK              ┊
┊ FileSystem/MkfsParams             ┊                                                              ┊
┊ FileSystem/Type                   ┊ xfs                                                          ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

and the storage class is:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: team-100-persistent-replicated
provisioner: linstor.csi.linbit.com
parameters:
  storagePool: "team-100"
  resourceGroup: "team-100-persistent-replicated"
  csi.storage.k8s.io/fstype: xfs
  nodeList: "dedi1-node1.23-106-60-155.lon-01.uk vm9-node2.23-106-61-193.lon-01.uk"
  placementPolicy: "Manual"
  property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io
  property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary
  property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect

Satellites are configured with zfs pools as so:

satelliteSet:
    enabled: true
    satelliteImage: quay.io/piraeusdatastore/piraeus-server:v1.19.1
    storagePools:
      zfsPools:
      - name: team-100
        zPool: rpool/team-100
        thin: true
    sslSecret: ""
    automaticStorageType: None
    affinity: {}
    tolerations: []
    resources: {}
    monitoringImage: quay.io/piraeusdatastore/drbd-reactor:v0.8.0
    monitoringBindAddress: ""
    kernelModuleInjectionImage: quay.io/piraeusdatastore/drbd9-jammy:v9.1.7
    kernelModuleInjectionMode: Compile
    kernelModuleInjectionAdditionalSourceDirectory: ""
    kernelModuleInjectionResources: {}
    kernelModuleInjectionExtraVolumeMounts: []
    additionalEnv: []
    sidecars: []
    extraVolumes: []
    customLabels: {}
    customAnnotations: {}

So I'm not sure if auto-promote is enabled by default (the man page says it is), and if so how to configure it to promote faster?

@Rid
Copy link
Author

Rid commented Aug 19, 2022

Here's an example of the drbd state when adding 6 pvcs at once:

pvc-210c18f6-9a43-4b94-8903-6d3767445044 role:Primary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-23a1dba9-25ab-44c0-a077-9604144d097c role:Secondary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 role:Secondary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-81da7546-1393-4931-9080-b90cd058709d role:Primary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-89bf316f-a382-4d2b-9b72-b0151fd78d1b role:Secondary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-9fe0df48-e921-4e5e-8fed-21d2e76c3423 role:Secondary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-a568b69c-b5a2-48e0-b64a-c2c82c64baf3 role:Primary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-bcbc88d7-2dfa-462b-9f76-1ece7e945abc role:Secondary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801 role:Secondary
  disk:UpToDate
  dedi1-node1.23-106-60-155.lon-01.uk role:Secondary
    peer-disk:UpToDate
  vm6-cplane1.23-106-61-231.lon-01.uk role:Secondary
    peer-disk:Diskless peer-client:yes

Eventually they are promoted, the only logs I can find relating to this are in /var/log/kern.log (I cannot find any related logs on any of the piraeus containers):

Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.548756] drbd pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: Preparing cluster-wide state change 2815340248 (0->-1 3/1)
Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.549358] drbd pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: State change 2815340248: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF8
Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.549363] drbd pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: Committing cluster-wide state change 2815340248 (0ms)
Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.549372] drbd pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: role( Secondary -> Primary )
Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.552361] XFS (drbd1009): Mounting V5 Filesystem
Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.575686] XFS (drbd1009): Ending clean mount
Aug 19 14:52:19 dedi1-node1 kernel: [ 2420.576200] xfs filesystem being mounted at /var/lib/kubelet/pods/8424cf60-4d08-428c-91d6-cca08f5b9c2e/volumes/kubernetes.io~csi/pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483/mount supports timestamps until 2038 (0x7fffffff)

@Rid
Copy link
Author

Rid commented Aug 19, 2022

It appears I got the causation wrong, it appears something else is causing the delay, and once whatever is causing the AttachVolume.Attach to stop timing out, the pvc is promoted to primary (the docs say a node is promoted to primary automatically upon successful mount, which could be the reason).

I came to this conclusion as manually forcing the node to be primary does not speed up AttachVolume.Attach in the pods:

Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Warning  FailedMount         23m (x2 over 30m)    kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[kube-api-access-flfr4 data]: timed out waiting for the condition
  Warning  FailedAttachVolume  108s (x37 over 63m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount         42s (x26 over 61m)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-flfr4]: timed out waiting for the condition

The csi-attacher container in the piraeus-op-csi-controller is full of the following repeating until the volume attaches sucessfully:

W0819 16:04:34.774528       1 csi_handler.go:189] VA csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b for volume pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 has attached status false but actual state true. Adding back to VA queue for forced reprocessing
I0819 16:05:26.028697       1 connection.go:183] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I0819 16:05:26.028697       1 connection.go:184] GRPC request: {"node_id":"vm9-node2.23-106-61-193.lon-01.uk","volume_capability":{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":5}},"volume_context":{"linstor.csi.linbit.com/mount-options":"","linstor.csi.linbit.com/post-mount-xfs-opts":"","linstor.csi.linbit.com/remote-access-policy":"true","linstor.csi.linbit.com/uses-volume-context":"true","storage.kubernetes.io/csiProvisionerIdentity":"1660915975988-8081-linstor.csi.linbit.com"},"volume_id":"pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801"}
I0819 16:05:29.736469       1 connection.go:186] GRPC response: {}
I0819 16:05:29.736559       1 connection.go:187] GRPC error: rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: failed to check existing assignment: context deadline exceeded
I0819 16:05:29.736617       1 csi_handler.go:591] Saving attach error to "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.747875       1 csi_handler.go:602] Saved attach error to "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.747914       1 controller.go:167] Ignoring VolumeAttachment "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b" change
I0819 16:05:29.747960       1 csi_handler.go:234] Error processing "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b": failed to attach: rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: failed to check existing assignment: context deadline exceeded
I0819 16:05:29.748042       1 controller.go:210] Started VA processing "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.748066       1 csi_handler.go:224] CSIHandler: processing VA "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.748080       1 csi_handler.go:251] Attaching "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.748098       1 csi_handler.go:421] Starting attach operation for "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.748145       1 csi_handler.go:335] PV finalizer is already set on "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483"
I0819 16:05:29.748187       1 csi_handler.go:740] Found NodeID vm9-node2.23-106-61-193.lon-01.uk in CSINode vm9-node2.23-106-61-193.lon-01.uk
I0819 16:05:29.748207       1 csi_handler.go:304] VA finalizer is already set on "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:05:29.748220       1 csi_handler.go:318] NodeID annotation is already set on "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b"
I0819 16:41:36.919246       1 connection.go:183] GRPC call: /csi.v1.Controller/ListVolumes
I0819 16:41:36.919267       1 connection.go:184] GRPC request: {}
I0819 16:41:36.972002       1 connection.go:186] GRPC response: {"entries":[{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":2147483648,"volume_id":"pvc-210c18f6-9a43-4b94-8903-6d3767445044"}},{"status":{"volume_condition":{"message":"Volume healthy"}},"volume":{"capacity_bytes":11811160064,"volume_id":"pvc-21c4f64e-cc21-46b0-9403-d8c3959c2095"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":10737418240,"volume_id":"pvc-23a1dba9-25ab-44c0-a077-9604144d097c"}},{"status":{"volume_condition":{"message":"Volume healthy"}},"volume":{"capacity_bytes":2147483648,"volume_id":"pvc-27f173d8-af36-4f44-8dc3-59ba86039075"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":209715200,"volume_id":"pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":11811160064,"volume_id":"pvc-81da7546-1393-4931-9080-b90cd058709d"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":209715200,"volume_id":"pvc-89bf316f-a382-4d2b-9b72-b0151fd78d1b"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":2147483648,"volume_id":"pvc-9fe0df48-e921-4e5e-8fed-21d2e76c3423"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":2147483648,"volume_id":"pvc-a568b69c-b5a2-48e0-b64a-c2c82c64baf3"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":11811160064,"volume_id":"pvc-bcbc88d7-2dfa-462b-9f76-1ece7e945abc"}},{"status":{"published_node_ids":["dedi1-node1.23-106-60-155.lon-01.uk","vm6-cplane1.23-106-61-231.lon-01.uk","vm9-node2.23-106-61-193.lon-01.uk"],"volume_condition":{"abnormal":true,"message":"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk"}},"volume":{"capacity_bytes":22548578304,"volume_id":"pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801"}}]}
I0819 16:41:36.972761       1 connection.go:187] GRPC error: <nil>

The linstor-csi-plugin container in the piraeus-op-csi-controller is full of the following repeating until the volume attaches sucessfully:

time="2022-08-19T16:20:29Z" level=error msg="method failed" error="rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: failed to check existing assignment: context deadline exceeded" linstorCSIComponent=driver method=/csi.v1.Controller/ControllerPublishVolume nodeID=vm9-node2.23-106-61-193.lon-01.uk provisioner=linstor.csi.linbit.com req="volume_id:\"pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483\" node_id:\"vm9-node2.23-106-61-193.lon-01.uk\" volume_capability:<mount:<fs_type:\"xfs\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:\"linstor.csi.linbit.com/mount-options\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/post-mount-xfs-opts\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/remote-access-policy\" value:\"true\" > volume_context:<key:\"linstor.csi.linbit.com/uses-volume-context\" value:\"true\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1660915975988-8081-linstor.csi.linbit.com\" > " resp="<nil>" version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d

Here's the VolumeAttachment resource:

Name:         csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: vm9-node2.23-106-61-193.lon-01.uk
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:  2022-08-19T15:48:29Z
  Finalizers:
    external-attacher/linstor-csi-linbit-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/linstor-csi-linbit-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2022-08-19T15:48:29Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2022-08-19T15:48:29Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2022-08-19T15:49:29Z
  Resource Version:  67251623
  UID:               461b03ea-fd77-4567-b6b8-b91e8f302fab
Spec:
  Attacher:   linstor.csi.linbit.com
  Node Name:  vm9-node2.23-106-61-193.lon-01.uk
  Source:
    Persistent Volume Name:  pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483
Status:
  Attach Error:
    Message:  rpc error: code = DeadlineExceeded desc = context deadline exceeded
    Time:     2022-08-19T16:11:29Z
  Attached:   false
Events:       <none>

Here's the linstor resource file:

cat /var/lib/linstor.d/pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483.res
# This file was generated by linstor(1.19.1), do not edit manually.
# Local node: vm9-node2.23-106-61-193.lon-01.uk
# Host name : vm9-node2.23-106-61-193.lon-01.uk

resource "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483"
{

    options
    {
        on-no-data-accessible suspend-io;
        on-no-quorum suspend-io;
        on-suspended-primary-outdated force-secondary;
        quorum majority;
    }

    net
    {
        cram-hmac-alg     sha1;
        shared-secret     "eJAAPCGDJHtnovtB/VOM";
        rr-conflict retry-connect;
        verify-alg "crct10dif-pclmul";
    }

    on vm9-node2.23-106-61-193.lon-01.uk
    {
        volume 0
        {
            disk        /dev/zvol/rpool/team-100/pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483_00000;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 8192;
            }
            meta-disk   internal;
            device      minor 1009;
        }
        node-id    1;
    }

    on dedi1-node1.23-106-60-155.lon-01.uk
    {
        volume 0
        {
            disk        /dev/drbd/this/is/not/used;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 8192;
            }
            meta-disk   internal;
            device      minor 1009;
        }
        node-id    0;
    }

    on vm6-cplane1.23-106-61-231.lon-01.uk
    {
        volume 0
        {
            disk        none;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 8192;
            }
            meta-disk   internal;
            device      minor 1009;
        }
        node-id    2;
    }

    connection
    {
        host vm9-node2.23-106-61-193.lon-01.uk address ipv4 23.106.61.193:7009;
        host dedi1-node1.23-106-60-155.lon-01.uk address ipv4 23.106.60.155:7009;
    }

    connection
    {
        host vm9-node2.23-106-61-193.lon-01.uk address ipv4 23.106.61.193:7009;
        host vm6-cplane1.23-106-61-231.lon-01.uk address ipv4 23.106.61.231:7009;
    }
}

I can successfully mount the volume manually:

# mount /dev/drbd1009 /tmp/test
# ls -la /tmp/test
total 41
drwxr-xr-x  5 root             root              149 Aug 19 14:52 .
drwxrwxrwt 11 root             root               11 Aug 19 17:59 ..
-rw-r--r--  1 root             root             6902 Aug 19 15:47 ganesha.log
-rw-------  1 root             root               36 Aug 19 14:52 nfs-provisioner.identity
drwxrwsrwx  5 systemd-coredump systemd-coredump   44 Aug 19 14:53 pvc-5199abc1-5c27-4940-8a68-52bf1febfd47
drwxr-xr-x  3 root             root               19 Aug 19 14:52 v4old
drwxr-xr-x  3 root             root               19 Aug 19 14:52 v4recov
-rw-------  1 root             root              922 Aug 19 14:52 vfs.conf
linstor node list
╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node                                            ┊ NodeType   ┊ Addresses                ┊ State   ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ SATELLITE  ┊ 23.106.60.155:3367 (SSL) ┊ EVICTED ┊
┊ piraeus-op-cs-controller-5dcf464ff-spzdh        ┊ CONTROLLER ┊ 10.0.2.141:3367 (SSL)    ┊ OFFLINE ┊
┊ piraeus-op-cs-controller-5dcf464ff-vd8lx        ┊ CONTROLLER ┊ 10.0.1.24:3367 (SSL)     ┊ Online  ┊
┊ piraeus-op-cs-controller-5dcf464ff-wpx95        ┊ CONTROLLER ┊ 10.0.4.166:3367 (SSL)    ┊ OFFLINE ┊
┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ SATELLITE  ┊ 23.106.61.231:3367 (SSL) ┊ Online  ┊
┊ vm7-cplane2.23-106-61-232.lon-01.uk ┊ SATELLITE  ┊ 23.106.61.232:3367 (SSL) ┊ Online  ┊
┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ SATELLITE  ┊ 23.106.61.193:3367 (SSL) ┊ Online  ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯

Some commands are taking a long time to complete:

time linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node                                            ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7009 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 14:32:30 ┊
┊ pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7009 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 14:31:59 ┊
┊ pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7009 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-08-19 14:33:13 ┊
┊ pvc-9fe0df48-e921-4e5e-8fed-21d2e76c3423 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7010 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 14:33:50 ┊
┊ pvc-9fe0df48-e921-4e5e-8fed-21d2e76c3423 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7010 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 14:36:49 ┊
┊ pvc-9fe0df48-e921-4e5e-8fed-21d2e76c3423 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7010 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-08-19 14:37:05 ┊
┊ pvc-23a1dba9-25ab-44c0-a077-9604144d097c ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7006 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 14:28:58 ┊
┊ pvc-23a1dba9-25ab-44c0-a077-9604144d097c ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7006 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 14:29:44 ┊
┊ pvc-23a1dba9-25ab-44c0-a077-9604144d097c ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7006 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-08-19 14:30:29 ┊
┊ pvc-81da7546-1393-4931-9080-b90cd058709d ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7000 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 13:53:57 ┊
┊ pvc-81da7546-1393-4931-9080-b90cd058709d ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7000 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 13:54:07 ┊
┊ pvc-81da7546-1393-4931-9080-b90cd058709d ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7000 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-08-19 13:54:23 ┊
┊ pvc-89bf316f-a382-4d2b-9b72-b0151fd78d1b ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7008 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 14:32:30 ┊
┊ pvc-89bf316f-a382-4d2b-9b72-b0151fd78d1b ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7008 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 14:31:51 ┊
┊ pvc-89bf316f-a382-4d2b-9b72-b0151fd78d1b ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7008 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-08-19 14:32:35 ┊
┊ pvc-210c18f6-9a43-4b94-8903-6d3767445044 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7003 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 13:54:18 ┊
┊ pvc-210c18f6-9a43-4b94-8903-6d3767445044 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7003 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 13:54:34 ┊
┊ pvc-210c18f6-9a43-4b94-8903-6d3767445044 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7003 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-08-19 13:55:03 ┊
┊ pvc-a568b69c-b5a2-48e0-b64a-c2c82c64baf3 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7004 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 13:54:18 ┊
┊ pvc-a568b69c-b5a2-48e0-b64a-c2c82c64baf3 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7004 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 13:54:41 ┊
┊ pvc-a568b69c-b5a2-48e0-b64a-c2c82c64baf3 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7004 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-08-19 13:55:36 ┊
┊ pvc-bcbc88d7-2dfa-462b-9f76-1ece7e945abc ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7005 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 14:28:10 ┊
┊ pvc-bcbc88d7-2dfa-462b-9f76-1ece7e945abc ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7005 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 14:28:22 ┊
┊ pvc-bcbc88d7-2dfa-462b-9f76-1ece7e945abc ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7005 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-08-19 14:28:39 ┊
┊ pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7007 ┊        ┊ Ok    ┊   INACTIVE ┊ 2022-08-19 14:30:15 ┊
┊ pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7007 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-08-19 14:31:09 ┊
┊ pvc-ccc2551c-f7d6-44ae-a148-7b567c5a5801 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7007 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-08-19 14:31:38 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

________________________________________________________
Executed in   87.24 secs      fish           external
   usr time  254.74 millis  783.00 micros  253.96 millis
   sys time  112.38 millis  421.00 micros  111.96 millis

@Rid
Copy link
Author

Rid commented Aug 19, 2022

Further info:

kubectl version | grep Server
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:18:48Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"}
grep Image: /root/k8s-config/manifests/piraeus.yaml
  storkImage: docker.io/openstorage/stork:2.8.2
  schedulerImage: k8s.gcr.io/kube-scheduler-amd64
  pluginImage: quay.io/piraeusdatastore/piraeus-csi:v0.20.0
  csiAttacherImage: k8s.gcr.io/sig-storage/csi-attacher:v3.5.0
  csiLivenessProbeImage: k8s.gcr.io/sig-storage/livenessprobe:v2.7.0
  csiNodeDriverRegistrarImage: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.5.1
  csiProvisionerImage: k8s.gcr.io/sig-storage/csi-provisioner:v3.2.1
  csiSnapshotterImage: k8s.gcr.io/sig-storage/csi-snapshotter:v6.0.1
  csiResizerImage: k8s.gcr.io/sig-storage/csi-resizer:v1.5.0
    controllerImage: quay.io/piraeusdatastore/piraeus-server:v1.19.1
    satelliteImage: quay.io/piraeusdatastore/piraeus-server:v1.19.1
    monitoringImage: quay.io/piraeusdatastore/drbd-reactor:v0.8.0
    kernelModuleInjectionImage: quay.io/piraeusdatastore/drbd9-jammy:v9.1.7

linstor err l contains no errors.

Ubuntu 22.04 LTS \n \l
Linux vm9-node2.23-106-61-193.lon-01.uk 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

drbd enabled through injector

[  629.222020] drbd: module verification failed: signature and/or required key missing - tainting kernel
[  629.230562] drbd: initialized. Version: 9.1.7 (api:2/proto:110-121)
[  629.230564] drbd: GIT-hash: bfd2450739e3e27cfd0a2eece2cde3d94ad993ae build by @vm9-node2.23-106-61-193.lon-01.uk.appsolo.com, 2022-08-18 20:28:16
[  629.230564] drbd: registered as block device major 147

Also attached log from kubectl -n default logs --since=24h pod/piraeus-op-csi-controller-5f6cd85d8b-244hf --all-containers linstor.log

Attached output from linstor -m resource-definition list res.json.log

On the surviving storage node, a zfs list process seems to keep spawning every second or so using 100% CPU:

      1  520310  520310    5973 ?             -1 Sl       0   0:27 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id eeea8b4990656ec8e8576ad3d320b573d703b53723e4511eec2cd4d9c84b25a5 -address /run/containerd/containerd.sock
 520310  520341  520341  520341 ?             -1 Ss   65535   0:00  \_ /pause
 520310  521614  521614  521614 ?             -1 Ssl      0   0:01  \_ /usr/bin/k8s-await-election /usr/bin/piraeus-entry.sh startSatellite
 521614  521746  521614  521614 ?             -1 S        0   0:00  |   \_ /bin/sh /usr/bin/piraeus-entry.sh startSatellite
 521746  522206  521614  521614 ?             -1 Sl       0   3:04  |       \_ java -Xms32M -XX:MinHeapFreeRatio=40 -XX:MaxHeapFreeRatio=70 --illegal-access=permit -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config
 522206  523278  521614  521614 ?             -1 S        0   0:00  |           \_ drbdsetup events2 all
 522206 2194352  521614  521614 ?             -1 R        0   0:05  |           \_ zfs list -H -p -o name,used,refer,volsize,type -t volume,snapshot
 520310  522064  522064  522064 ?             -1 Ssl      0   0:01  \_ /usr/sbin/drbd-reactor
 522064  522081  522064  522064 ?             -1 S        0   0:00      \_ drbdsetup events2 --full --poll
 522064  770682  770678  770271 ?             -1 Z        0   0:00      \_ [drbdsetup] <defunct>
 522064  770786  770782  770271 ?             -1 Z        0   0:00      \_ [drbdsetup] <defunct>

With the above example the volume attach appears to have failed indefinitely as it's been over 3 hours. Sometimes the volumes will attach after roughly 45 minutes.

@Rid
Copy link
Author

Rid commented Aug 22, 2022

Here are some further logs from kube-controller-manager with --v=4

I0822 13:04:24.426940       1 reconciler.go:325] "attacherDetacher.AttachVolume started" volume={VolumeToAttach:{MultiAttachErrorReported:false VolumeName:kubernetes.io/csi/linstor.csi.linbit.com^pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 VolumeSpec:0xc0052f7650 NodeName:vm9-node2.23-106-61-193.lon-01.uk ScheduledPods:[&Pod{ObjectMeta:{nfs-taiga-staticdata-nfs-server-provisioner-0 nfs-taiga-staticdata-nfs-server-provisioner- team-100  bcccf001-6205-451e-89f8-bead2761be28 67233800 0 2022-08-19 15:48:51 +0000 UTC <nil> <nil> map[app:nfs-server-provisioner chart:nfs-server-provisioner-1.4.0 controller-revision-hash:nfs-taiga-staticdata-nfs-server-provisioner-5c75756788 heritage:Helm release:nfs-taiga-staticdata statefulset.kubernetes.io/pod-name:nfs-taiga-staticdata-nfs-server-provisioner-0] map[] [{apps/v1 StatefulSet nfs-taiga-staticdata-nfs-server-provisioner 554741f9-b120-46f2-8905-ebdd2bdb5607 0xc004267447 0xc004267448}] []  [{kube-controller-manager Update v1 2022-08-19 15:48:51 +0000 UTC FieldsV1 {"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:app":{},"f:chart":{},"f:controller-revision-hash":{},"f:heritage":{},"f:release":{},"f:statefulset.kubernetes.io/pod-name":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"554741f9-b120-46f2-8905-ebdd2bdb5607\"}":{}}},"f:spec":{"f:affinity":{".":{},"f:nodeAffinity":{".":{},"f:requiredDuringSchedulingIgnoredDuringExecution":{}},"f:podAntiAffinity":{".":{},"f:requiredDuringSchedulingIgnoredDuringExecution":{}}},"f:containers":{"k:{\"name\":\"nfs-server-provisioner\"}":{".":{},"f:args":{},"f:env":{".":{},"k:{\"name\":\"POD_IP\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"POD_NAMESPACE\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"SERVICE_NAME\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:ports":{".":{},"k:{\"containerPort\":111,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":111,\"protocol\":\"UDP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":662,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":662,\"protocol\":\"UDP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":875,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":875,\"protocol\":\"UDP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":2049,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":2049,\"protocol\":\"UDP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":20048,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":20048,\"protocol\":\"UDP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":32803,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}},"k:{\"containerPort\":32803,\"protocol\":\"UDP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}}},"f:resources":{},"f:securityContext":{".":{},"f:capabilities":{".":{},"f:add":{}}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/export\"}":{".":{},"f:mountPath":{},"f:name":{}}}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:hostname":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:serviceAccount":{},"f:serviceAccountName":{},"f:subdomain":{},"f:terminationGracePeriodSeconds":{},"f:volumes":{".":{},"k:{\"name\":\"data\"}":{".":{},"f:name":{},"f:persistentVolumeClaim":{".":{},"f:claimName":{}}}}}} } {kubelet Update v1 2022-08-19 15:49:13 +0000 UTC FieldsV1 {"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:startTime":{}}} status}]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:data,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:&PersistentVolumeClaimVolumeSource{ClaimName:data-nfs-taiga-staticdata-nfs-server-provisioner-0,ReadOnly:false,},RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-d79lc,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:nfs-server-provisioner,Image:k8s.gcr.io/sig-storage/nfs-provisioner:v3.0.0,Command:[],Args:[-provisioner=cluster.local/nfs-taiga-staticdata-nfs-server-provisioner -device-based-fsids=false],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:nfs,HostPort:0,ContainerPort:2049,Protocol:TCP,HostIP:,},ContainerPort{Name:nfs-udp,HostPort:0,ContainerPort:2049,Protocol:UDP,HostIP:,},ContainerPort{Name:nlockmgr,HostPort:0,ContainerPort:32803,Protocol:TCP,HostIP:,},ContainerPort{Name:nlockmgr-udp,HostPort:0,ContainerPort:32803,Protocol:UDP,HostIP:,},ContainerPort{Name:mountd,HostPort:0,ContainerPort:20048,Protocol:TCP,HostIP:,},ContainerPort{Name:mountd-udp,HostPort:0,ContainerPort:20048,Protocol:UDP,HostIP:,},ContainerPort{Name:rquotad,HostPort:0,ContainerPort:875,Protocol:TCP,HostIP:,},ContainerPort{Name:rquotad-udp,HostPort:0,ContainerPort:875,Protocol:UDP,HostIP:,},ContainerPort{Name:rpcbind,HostPort:0,ContainerPort:111,Protocol:TCP,HostIP:,},ContainerPort{Name:rpcbind-udp,HostPort:0,ContainerPort:111,Protocol:UDP,HostIP:,},ContainerPort{Name:statd,HostPort:0,ContainerPort:662,Protocol:TCP,HostIP:,},ContainerPort{Name:statd-udp,HostPort:0,ContainerPort:662,Protocol:UDP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:POD_IP,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SERVICE_NAME,Value:nfs-taiga-staticdata-nfs-server-provisioner,ValueFrom:nil,},EnvVar{Name:POD_NAMESPACE,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:data,ReadOnly:false,MountPath:/export,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-d79lc,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[DAC_READ_SEARCH SYS_RESOURCE],Drop:[],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Always,TerminationGracePeriodSeconds:*100,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:nfs-taiga-staticdata-nfs-server-provisioner,DeprecatedServiceAccount:nfs-taiga-staticdata-nfs-server-provisioner,NodeName:vm9-node2.23-106-61-193.lon-01.uk,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:nfs-taiga-staticdata-nfs-server-provisioner-0,Subdomain:nfs-taiga-staticdata-nfs-server-provisioner,Affinity:&Affinity{NodeAffinity:&NodeAffinity{RequiredDuringSchedulingIgnoredDuringExecution:&NodeSelector{NodeSelectorTerms:[]NodeSelectorTerm{NodeSelectorTerm{MatchExpressions:[]NodeSelectorRequirement{NodeSelectorRequirement{Key:kubernetes.io/hostname,Operator:In,Values:[dedi1-node1.23-106-60-155.lon-01.uk vm9-node2.23-106-61-193.lon-01.uk],},},MatchFields:[]NodeSelectorRequirement{},},},},PreferredDuringSchedulingIgnoredDuringExecution:[]PreferredSchedulingTerm{},},PodAffinity:nil,PodAntiAffinity:&PodAntiAffinity{RequiredDuringSchedulingIgnoredDuringExecution:[]PodAffinityTerm{PodAffinityTerm{LabelSelector:&v1.LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:release,Operator:In,Values:[taiga-staticdata],},},},Namespaces:[],TopologyKey:kubernetes.io/hostname,NamespaceSelector:nil,},},PreferredDuringSchedulingIgnoredDuringExecution:[]WeightedPodAffinityTerm{},},},SchedulerName:default-scheduler,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{Toleration{Key:node.kubernetes.io/not-ready,Operator:Exists,Value:,Effect:NoExecute,TolerationSeconds:*20,},Toleration{Key:node.kubernetes.io/unreachable,Operator:Exists,Value:,Effect:NoExecute,TolerationSeconds:*20,},},HostAliases:[]HostAlias{},PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:*PreemptLowerPriority,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2022-08-19 15:48:51 +0000 UTC,Reason:,Message:,},PodCondition{Type:Ready,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2022-08-19 15:48:51 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [nfs-server-provisioner],},PodCondition{Type:ContainersReady,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2022-08-19 15:48:51 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [nfs-server-provisioner],},PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2022-08-19 15:48:51 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:23.106.61.193,PodIP:,StartTime:2022-08-19 15:48:51 +0000 UTC,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:nfs-server-provisioner,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:k8s.gcr.io/sig-storage/nfs-provisioner:v3.0.0,ImageID:,ContainerID:,Started:*false,},},QOSClass:BestEffort,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}]}}
E0822 13:04:24.958001       1 csi_attacher.go:722] kubernetes.io/csi: attachment for pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0822 13:04:24.958051       1 actual_state_of_world.go:357] Volume "kubernetes.io/csi/linstor.csi.linbit.com^pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" is already added to attachedVolume list to node "vm9-node2.23-106-61-193.lon-01.uk", update device path ""
E0822 13:04:24.958155       1 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/linstor.csi.linbit.com^pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 podName: nodeName:vm9-node2.23-106-61-193.lon-01.uk}" failed. No retries permitted until 2022-08-22 13:06:26.958119222 +0000 UTC m=+575.556584361 (durationBeforeRetry 2m2s). Error: AttachVolume.Attach failed for volume "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483") from node "vm9-node2.23-106-61-193.lon-01.uk" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0822 13:04:24.958190       1 event.go:294] "Event occurred" object="team-100/nfs-taiga-staticdata-nfs-server-provisioner-0" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FailedAttachVolume" message="AttachVolume.Attach failed for volume \"pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483\" : rpc error: code = DeadlineExceeded desc = context deadline exceeded"
I0822 13:04:28.511467       1 actual_state_of_world.go:400] SetVolumeMountedByNode volume kubernetes.io/csi/linstor.csi.linbit.com^pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483 to the node "vm9-node2.23-106-61-193.lon-01.uk" mounted true
I0822 13:04:39.188610       1 pv_controller.go:254] synchronizing PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0]: phase: Bound, bound to: "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483", bindCompleted: true, boundByController: true
I0822 13:04:39.188657       1 pv_controller.go:502] synchronizing bound PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0]: volume "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" found: phase: Bound, bound to: "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0 (uid: 7e1063f2-b831-41a6-8c85-fee1fbf6c483)", boundByController: false
I0822 13:04:39.188669       1 pv_controller.go:519] synchronizing bound PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0]: claim is already correctly bound
I0822 13:04:39.188678       1 pv_controller.go:1023] binding volume "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" to claim "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0"
I0822 13:04:39.188689       1 pv_controller.go:921] updating PersistentVolume[pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483]: binding to "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0"
I0822 13:04:39.188723       1 pv_controller.go:933] updating PersistentVolume[pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483]: already bound to "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0"
I0822 13:04:39.188738       1 pv_controller.go:869] updating PersistentVolume[pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483]: set phase Bound
I0822 13:04:39.188753       1 pv_controller.go:872] updating PersistentVolume[pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483]: phase Bound already set
I0822 13:04:39.188773       1 pv_controller.go:961] updating PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0]: binding to "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483"
I0822 13:04:39.188802       1 pv_controller.go:1008] updating PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0]: already bound to "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483"
I0822 13:04:39.188834       1 pv_controller.go:762] updating PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0] status: set phase Bound
I0822 13:04:39.188854       1 pv_controller.go:820] updating PersistentVolumeClaim[team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0] status: phase Bound already set
I0822 13:04:39.188874       1 pv_controller.go:1049] volume "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" bound to claim "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0"
I0822 13:04:39.188908       1 pv_controller.go:1050] volume "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" status after binding: phase: Bound, bound to: "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0 (uid: 7e1063f2-b831-41a6-8c85-fee1fbf6c483)", boundByController: false
I0822 13:04:39.188944       1 pv_controller.go:1051] claim "team-100/data-nfs-taiga-staticdata-nfs-server-provisioner-0" status after binding: phase: Bound, bound to: "pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483", bindCompleted: true, boundByController: true

The only thing which really stands out is:

I0822 13:04:24.958051       1 actual_state_of_world.go:357] Volume "kubernetes.io/csi/linstor.csi.linbit.com^pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483" is already added to attachedVolume list to node "vm9-node2.23-106-61-193.lon-01.uk", update device path ""

I'm not sure if this is causing or caused by the timeout issue, the parent function for this is https://github.com/kubernetes/kubernetes/blob/95ee5ab382d64cfe6c28967f36b53970b8374491/pkg/kubelet/volumemanager/cache/actual_state_of_world.go#L338

I'm unsure why the device path would be empty?

I was looking for similar issues with volume attachment with CSI plugins, longhorn/longhorn#2322 sounds similar but these volumes contain hardly any files so it cannot be a delay due to ownership changes.

I will continue trying to dig into this.

@rck
Copy link
Member

rck commented Aug 23, 2022

Hi @Rid , just a quick update: thanks for the detailed logs, with most devs on well deserved vacation it will take a bit till somebody looks into this, but they will eventually

@blampe
Copy link

blampe commented Aug 30, 2022

dedi1-node1.23-106-60-155.lon-01.uk ┊ SATELLITE ┊ 23.106.60.155:3367 (SSL) ┊ EVICTED

I wonder if this has something to do with auto-eviction?

@WanzenBug
Copy link
Member

I0819 16:05:29.747960       1 csi_handler.go:234] Error processing "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b": failed to attach: rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: failed to check existing assignment: context deadline exceeded

That looks like the most promising lead to me. The ControllerPublishVolume call has a timeout of 1 minute AFAIK. So I am wondering why it exceeds that timeout. I think it only really checks that the volume is available on whatever node was requested, so there shouldn't really be any potential for a time out. Can you run with the chart value set to csi.logLevel=debug and check the logs of the csi-controller pod, specifically the linstor-csi-plugin container for more information?

@Rid
Copy link
Author

Rid commented Aug 31, 2022

dedi1-node1.23-106-60-155.lon-01.uk ┊ SATELLITE ┊ 23.106.60.155:3367 (SSL) ┊ EVICTED

I wonder if this has something to do with auto-eviction?

The node dedi1-node1.23-106-60-155.lon-01.uk is properly evicted as we're testing fail-over after a node outage. We've left the node offline to try and debug the issue (where PVs are timing out during volume attachment).

I0819 16:05:29.747960       1 csi_handler.go:234] Error processing "csi-d9e810a254b0ffa73804d0480a395b9e575625dd0482256f9213db53b5fc908b": failed to attach: rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-7e1063f2-b831-41a6-8c85-fee1fbf6c483: failed to check existing assignment: context deadline exceeded

That looks like the most promising lead to me. The ControllerPublishVolume call has a timeout of 1 minute AFAIK. So I am wondering why it exceeds that timeout. I think it only really checks that the volume is available on whatever node was requested, so there shouldn't really be any potential for a time out. Can you run with the chart value set to csi.logLevel=debug and check the logs of the csi-controller pod, specifically the linstor-csi-plugin container for more information?

I'll increase the log value today, can you possibly link me to the controller code which does the check on the linstor side?

@WanzenBug
Copy link
Member

can you possibly link me to the controller code which does the check on the linstor side?

That's the entry on the csi plugin side: https://github.com/piraeusdatastore/linstor-csi/blob/master/pkg/driver/driver.go#L613

And thats were the attach operation actually happens: https://github.com/piraeusdatastore/linstor-csi/blob/master/pkg/client/linstor.go#L430

@Rid
Copy link
Author

Rid commented Aug 31, 2022

I've recreated the pvs with the debug logs enabled, listing the resources is taking a long time:

# time linstor r l -a
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node                                            ┊ Port ┊ Usage  ┊ Conns                                                       ┊      State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-3eb665ec-27ec-4875-8df9-106751f1a832 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7008 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:39:18 ┊
┊ pvc-3eb665ec-27ec-4875-8df9-106751f1a832 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7008 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:39:41 ┊
┊ pvc-3eb665ec-27ec-4875-8df9-106751f1a832 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7008 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:40:27 ┊
┊ pvc-3ec4cd7c-a369-4fe8-b0c3-68528ad1ef37 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7007 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:39:18 ┊
┊ pvc-3ec4cd7c-a369-4fe8-b0c3-68528ad1ef37 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7007 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:39:39 ┊
┊ pvc-3ec4cd7c-a369-4fe8-b0c3-68528ad1ef37 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7007 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:39:56 ┊
┊ pvc-95f607ac-89fb-43d3-81a1-2ad89bdfa438 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7004 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:38:34 ┊
┊ pvc-95f607ac-89fb-43d3-81a1-2ad89bdfa438 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7004 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:38:53 ┊
┊ pvc-95f607ac-89fb-43d3-81a1-2ad89bdfa438 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7004 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:39:30 ┊
┊ pvc-8932ceaf-ebc6-4c7f-855f-b3c8da2a9564 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7009 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:39:19 ┊
┊ pvc-8932ceaf-ebc6-4c7f-855f-b3c8da2a9564 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7009 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:39:44 ┊
┊ pvc-8932ceaf-ebc6-4c7f-855f-b3c8da2a9564 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7009 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:40:27 ┊
┊ pvc-28069ce0-b3a0-4f46-94db-ae5f2eda5d4d ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7006 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:38:34 ┊
┊ pvc-28069ce0-b3a0-4f46-94db-ae5f2eda5d4d ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7006 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:38:59 ┊
┊ pvc-28069ce0-b3a0-4f46-94db-ae5f2eda5d4d ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7006 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:39:30 ┊
┊ pvc-d2b15e29-2594-40b3-90ae-9a14c0fda0bc ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7003 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:38:34 ┊
┊ pvc-d2b15e29-2594-40b3-90ae-9a14c0fda0bc ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7003 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:38:50 ┊
┊ pvc-d2b15e29-2594-40b3-90ae-9a14c0fda0bc ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7003 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:39:08 ┊
┊ pvc-fbce07ea-aa30-4093-acba-82661d040903 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7010 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:39:19 ┊
┊ pvc-fbce07ea-aa30-4093-acba-82661d040903 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7010 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:39:46 ┊
┊ pvc-fbce07ea-aa30-4093-acba-82661d040903 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7010 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:40:27 ┊
┊ pvc-fc12a115-ace3-49d2-8fb1-a3cb8b1a3b38 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7005 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:38:34 ┊
┊ pvc-fc12a115-ace3-49d2-8fb1-a3cb8b1a3b38 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7005 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:38:56 ┊
┊ pvc-fc12a115-ace3-49d2-8fb1-a3cb8b1a3b38 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7005 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:39:30 ┊
┊ pvc-ff369c1b-3abd-4e90-9481-7aeb376626e0 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7000 ┊        ┊                                                             ┊    Unknown ┊ 2022-08-31 15:37:33 ┊
┊ pvc-ff369c1b-3abd-4e90-9481-7aeb376626e0 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7000 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊ TieBreaker ┊ 2022-08-31 15:37:50 ┊
┊ pvc-ff369c1b-3abd-4e90-9481-7aeb376626e0 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7000 ┊ Unused ┊ Connecting(dedi1-node1.23-106-60-155.lon-01.uk) ┊   UpToDate ┊ 2022-08-31 15:38:00 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

________________________________________________________
Executed in   55.88 secs      fish           external
   usr time  276.36 millis    0.00 millis  276.36 millis
   sys time   61.65 millis    1.15 millis   60.51 millis

After the pv's were attached this went down to about 9 seconds.

All pods were stuck for 15m then created, we did a crictl prune and deleted a bunch of exited containers, this reduced the zfs datasets from over 1300 down to 658 reducing the amount of time it takes to do a zfs list by a second.

The system we're creating will have thousands of datasets though (as each system will have thousands of containers and we're running ZFS on root)

I've attached the logs of the linstor-csi-plugin from the time when the pvs were created. debug-logs.txt

Analysing the logs shows the initial ControllerPublishVolume failed due to connection being refused when the node goes offline, we can then see it switches after a few minutes to:

time="2022-08-31T16:04:04Z" level=error msg="method failed" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.Run.func1 file="/src/pkg/driver/driver.go:1220" error="rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-ff369c1b-3abd-4e90-9481-7aeb376626e0: context deadline exceeded" linstorCSIComponent=driver method=/csi.v1.Controller/ControllerPublishVolume nodeID=vm7-cplane2.23-106-61-232.lon-01.uk.appsolo.com provisioner=linstor.csi.linbit.com req="volume_id:\"pvc-ff369c1b-3abd-4e90-9481-7aeb376626e0\" node_id:\"vm9-node2.23-106-61-193.lon-01.uk.appsolo.com\" volume_capability:<mount:<fs_type:\"xfs\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:\"linstor.csi.linbit.com/mount-options\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/post-mount-xfs-opts\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/remote-access-policy\" value:\"true\" > volume_context:<key:\"linstor.csi.linbit.com/uses-volume-context\" value:\"true\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1661955201588-8081-linstor.csi.linbit.com\" > " resp="<nil>" version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d

Unfortunately there doesn't appear to be any information about why this is timing out. However we can deduce that the timeout is happening here:

err = d.Assignments.Attach(ctx, req.GetVolumeId(), req.GetNodeId(), req.GetReadonly(), existingVolume.UseQuorum)

I believe the timeout may be occurring on this request:

ress, err := s.client.Resources.GetResourceView(ctx, &lapi.ListOpts{Resource: []string{volId}})

As it takes 5 seconds to get a response:

# time curl -v -k -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/view/resources?limit=0&offset=0&resources=pvc-3eb665ec-27ec-4875-8df9-106751f1a832'
*   Trying 10.74.207.153:3371...
* Connected to piraeus-op-cs.default.svc (10.74.207.153) port 3371 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=piraeus-op-cs
*  start date: Aug 31 12:46:10 2022 GMT
*  expire date: Aug 28 12:46:10 2032 GMT
*  issuer: CN=piraeus-op-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
> GET /v1/view/resources?limit=0&offset=0&resources=pvc-3eb665ec-27ec-4875-8df9-106751f1a832 HTTP/1.1
> Host: piraeus-op-cs.default.svc:3371
> User-Agent: curl/7.74.0
> Accept: application/json
>
* Empty reply from server
* Connection #0 to host piraeus-op-cs.default.svc left intact
curl: (52) Empty reply from server

real	0m5.054s
user	0m0.020s
sys	0m0.020s

Strangely getting the resources and resource definitions via curl gives an empty reply, is this normal?

# curl -k -X GET https://piraeus-op-cs.default.svc:3371/v1/view/resources
curl: (52) Empty reply from server

Here the resources view times out after 23 secs:

time="2022-08-31T16:05:07Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/view/resources?limit=0&offset=0&resources=pvc-3eb665ec-27ec-4875-8df9-106751f1a832'"
time="2022-08-31T16:05:30Z" level=error msg="method failed" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.Run.func1 file="/src/pkg/driver/driver.go:1220" error="rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-3eb665ec-27ec-4875-8df9-106751f1a832: context canceled" linstorCSIComponent=driver method=/csi.v1.Controller/ControllerPublishVolume nodeID=vm7-cplane2.23-106-61-232.lon-01.uk.appsolo.com provisioner=linstor.csi.linbit.com 

Here on the same resources view it times out after just 1 second:

time="2022-08-31T16:06:29Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/view/resources?limit=0&offset=0&resources=pvc-3eb665ec-27ec-4875-8df9-106751f1a832'"
time="2022-08-31T16:06:30Z" level=error msg="method failed" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.Run.func1 file="/src/pkg/driver/driver.go:1220" error="rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-3eb665ec-27ec-4875-8df9-106751f1a832: context deadline exceeded" linstorCSIComponent=driver method=/csi.v1.Controller/ControllerPublishVolume nodeID=vm7-cplane2.23-106-61-232.lon-01.uk.appsolo.com provisioner=linstor.csi.linbit.com req="volume_id:\"pvc-3eb665ec-27ec-4875-8df9-106751f1a832\" node_id:\"vm9-node2.23-106-61-193.lon-01.uk.appsolo.com\" volume_capability:<mount:<fs_type:\"xfs\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:\"linstor.csi.linbit.com/mount-options\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/post-mount-xfs-opts\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/remote-access-policy\" value:\"true\" > volume_context:<key:\"linstor.csi.linbit.com/uses-volume-context\" value:\"true\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1661955201588-8081-linstor.csi.linbit.com\" > " resp="<nil>" version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d

In the case below the resource definition request times out after 1 minute:

time="2022-08-31T16:06:31Z" level=debug msg="getting assignments" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Status" file="/src/pkg/client/linstor.go:1626" linstorCSIComponent=client volume=pvc-3eb665ec-27ec-4875-8df9-106751f1a832
time="2022-08-31T16:06:31Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-3eb665ec-27ec-4875-8df9-106751f1a832/resources'"
time="2022-08-31T16:07:30Z" level=error msg="method failed" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.Run.func1 file="/src/pkg/driver/driver.go:1220" error="rpc error: code = Internal desc = ControllerPublishVolume failed for pvc-3eb665ec-27ec-4875-8df9-106751f1a832: failed to check existing assignment: context canceled" linstorCSIComponent=driver method=/csi.v1.Controller/ControllerPublishVolume nodeID=vm7-cplane2.23-106-61-232.lon-01.uk.appsolo.com provisioner=linstor.csi.linbit.com req="volume_id:\"pvc-3eb665ec-27ec-4875-8df9-106751f1a832\" node_id:\"vm9-node2.23-106-61-193.lon-01.uk.appsolo.com\" volume_capability:<mount:<fs_type:\"xfs\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:\"linstor.csi.linbit.com/mount-options\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/post-mount-xfs-opts\" value:\"\" > volume_context:<key:\"linstor.csi.linbit.com/remote-access-policy\" value:\"true\" > volume_context:<key:\"linstor.csi.linbit.com/uses-volume-context\" value:\"true\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1661955201588-8081-linstor.csi.linbit.com\" > " resp="<nil>" version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d

@WanzenBug
Copy link
Member

WanzenBug commented Sep 1, 2022

Hmm, I'm not sure on the order of those log lines. It could be that there are multiple requests interleaved. We should probably start adding a request ID to the logs at some point.

The empty replies are probably because of the missing client certificates for mTLS.

I not sure why it would take LINSTOR that long to answer these requests. You said you have a lot of zfs datasets, but the actual ZVols for LINSTOR seem to be few in numbers. Maybe worth to ask on the linstor-server project directly.

And maybe we can move away from the "view" method in that case, not sure if that would help

@Rid
Copy link
Author

Rid commented Sep 1, 2022

The zfs list command itself takes 1.67 secs. I do think it would be better to do a list on the datasets in use by linstor rather than everything, especially where there are many more datasets outside of linstor.

as an example, zfs list -r rpool/team-100 takes just 14ms.

The link https://gitlab.at.linbit.com/linstor/linstor-server is not working for me, did you mean https://github.com/LINBIT/linstor-server?

@WanzenBug
Copy link
Member

The link https://gitlab.at.linbit.com/linstor/linstor-server is not working for me, did you mean https://github.com/LINBIT/linstor-server?

Yes, sorry, used the internal link there...

@Rid
Copy link
Author

Rid commented Sep 1, 2022

I think we may be able to speed it up by implementing LINBIT/linstor-server#309 (comment) (I will test it).

However we should also be able to tolerate a delay of a few seconds from the server. As it's possible in some cases a user could have thousands of volumes and snapshots used by linstor which could produce the same issue even with the speed up.

Can you let me know where the timeout value is set on the http requests? I can try increasing the value to also check if that solves the issue.

@WanzenBug
Copy link
Member

The timeout is created by the CSI sidecars. In this case I think it's here: https://github.com/kubernetes-csi/external-attacher/ (--timeout) which seems to default to 15s. I thought it was 1m, but that could have been another sidecar.

@Rid
Copy link
Author

Rid commented Sep 1, 2022

      1  762125  762125    6213 ?             -1 Sl       0   7:37 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id ed2cfaae4049056770ad09a6d58720a363aa6b89df66c56baed57ad3e6180877 -address /run/containerd/containerd.sock
 762125  762271  762271  762271 ?             -1 Ss   65535   0:00  \_ /pause
 762125  803731  803731  803731 ?             -1 Ssl      0   0:35  \_ /csi-attacher --v=5 --csi-address=/var/lib/csi/sockets/pluginproxy/./csi.sock --timeout=1m --leader-election=true --leader-election-namespace=default --worker-threads=30
 762125  804114  804114  804114 ?             -1 Ssl      0   0:36  \_ /livenessprobe --csi-address=/var/lib/csi/sockets/pluginproxy/./csi.sock
 762125  815470  815470  815470 ?             -1 Ssl      0   3:01  \_ /csi-provisioner --csi-address=/var/lib/csi/sockets/pluginproxy/./csi.sock --timeout=1m --default-fstype=ext4 --feature-gates=Topology=true --leader-election=true --leader-election-namespace=default --enable-capacity --extra-create-metadata --capacity-ownerref-level=2 --worker-threads=30
 762125  816603  816603  816603 ?             -1 Ssl      0   0:32  \_ /csi-snapshotter --timeout=1m --csi-address=/var/lib/csi/sockets/pluginproxy/./csi.sock --leader-election=true --leader-election-namespace=default --worker-threads=30
 762125  817633  817633  817633 ?             -1 Ssl      0   0:33  \_ /csi-resizer --v=5 --csi-address=/var/lib/csi/sockets/pluginproxy/./csi.sock --timeout=1m --handle-volume-inuse-error=false --leader-election=true --leader-election-namespace=default --workers=30
 762125  818013  818013  818013 ?             -1 Ssl      0   0:32  \_ /linstor-csi --csi-endpoint=unix:///var/lib/csi/sockets/pluginproxy/./csi.sock --node=vm9-node2.23-106-61-193.lon-01.uk.appsolo.com --linstor-endpoint=https://piraeus-op-cs.default.svc:3371 --log-level=debug

These are hardcoded here:

https://github.com/piraeusdatastore/piraeus-operator/blob/5959d817d0791641670ea639f4996c65edffc77a/pkg/controller/linstorcsidriver/linstorcsidriver_controller.go#L828-L910

It looks like the sidecar container is set to 1m, but the timeout is actually happening in the linstor-csi container, it doesn't appear to have any timeout flag set.

So either linstor-csi has a low ctx timeout, or the delay on those endpoints is not the root cause.

@blampe
Copy link

blampe commented Sep 3, 2022

Out of curiosity, did you provision your volumes before adding the HA parameters to your storage class? (E.g. by deleting it and then re-creating with new parameters.) This is something I just ran into -- none of my pods were getting handled correctly by HA because they were provisioned before I had added the HA params. I had to manually adjust their ResourceGroup to add the DRBD options.

@Rid
Copy link
Author

Rid commented Sep 6, 2022

Out of curiosity, did you provision your volumes before adding the HA parameters to your storage class? (E.g. by deleting it and then re-creating with new parameters.) This is something I just ran into -- none of my pods were getting handled correctly by HA because they were provisioned before I had added the HA params. I had to manually adjust their ResourceGroup to add the DRBD options.

We don't have any HA parameters on the storage class, the latest version of https://github.com/piraeusdatastore/piraeus-ha-controller is opt-out rather than opt-in like the previous version.

We can see that the pods are getting rescheduled on to the correct node properly, they're just not being attached due to a context timeout.

@Rid
Copy link
Author

Rid commented Sep 6, 2022

@WanzenBug I tried to follow the calls up from

func (s *Linstor) Attach(ctx context.Context, volId, node string, readOnly, volQuorum bool) error {
to find where ctx is set. It appears to be in the grpc code somewhere but it's not easy to find without a stack trace.

As you mentioned before, it's difficult to know which requests are timing out when due to there being no identifiers on the API calls. So it might be worth adding some additional debugging.

Would you be able to run curl -k -X GET https://piraeus-op-cs.default.svc:3371/v1/view/resources on a healthy system and let me know the output?

@WanzenBug
Copy link
Member

Sure. For completeness, I added some flags to curl to make it work with mTLS (the debug output in the logs does not handle that correctly), and also my deployment is named linstor-cs instead of piraeus-op-cs, but that will not influence the output. I've also cut out the DRBD hmac secret, just to be sure 😉

kubectl exec deployment/linstor-cs-controller -- curl --cacert /etc/linstor/client/ca.crt --key /etc/linstor/client/tls.key --cert /etc/linstor/client/tls.crt https://linstor-cs.storage.svc:3371/v1/view/resources | jq '.[].layer_object.drbd.drbd_resource_definition.secret = ""'

view-resources-example.txt

@Rid
Copy link
Author

Rid commented Sep 7, 2022

Thanks @WanzenBug the command works with the mTLS certs included, although it took 10 seconds to complete (4 seconds on the subsequent cached requests).

Surely 4-10 seconds should not cause a timeout?

failed to check existing assignment: context canceled

Is this error always emitted by calling the cancel method on the ctx, or can it also be emitted from deadline exceeded/timeout?

@WanzenBug
Copy link
Member

I think that is only emitted when the ctx is cancelled. That probably happens somewhere in the grpc server, when the RPC call itself is canceled/times out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants