-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promotion of secondaries delayed/not happening #172
Comments
Here's an example of the drbd state when adding 6 pvcs at once:
Eventually they are promoted, the only logs I can find relating to this are in
|
It appears I got the causation wrong, it appears something else is causing the delay, and once whatever is causing the I came to this conclusion as manually forcing the node to be primary does not speed up
The
The
Here's the
Here's the linstor resource file:
I can successfully mount the volume manually:
Some commands are taking a long time to complete:
|
Further info:
drbd enabled through injector
Also attached log from Attached output from On the surviving storage node, a zfs list process seems to keep spawning every second or so using 100% CPU:
With the above example the volume attach appears to have failed indefinitely as it's been over 3 hours. Sometimes the volumes will attach after roughly 45 minutes. |
Here are some further logs from
The only thing which really stands out is:
I'm not sure if this is causing or caused by the timeout issue, the parent function for this is https://github.com/kubernetes/kubernetes/blob/95ee5ab382d64cfe6c28967f36b53970b8374491/pkg/kubelet/volumemanager/cache/actual_state_of_world.go#L338 I'm unsure why the device path would be empty? I was looking for similar issues with volume attachment with CSI plugins, longhorn/longhorn#2322 sounds similar but these volumes contain hardly any files so it cannot be a delay due to ownership changes. I will continue trying to dig into this. |
Hi @Rid , just a quick update: thanks for the detailed logs, with most devs on well deserved vacation it will take a bit till somebody looks into this, but they will eventually |
I wonder if this has something to do with auto-eviction? |
That looks like the most promising lead to me. The ControllerPublishVolume call has a timeout of 1 minute AFAIK. So I am wondering why it exceeds that timeout. I think it only really checks that the volume is available on whatever node was requested, so there shouldn't really be any potential for a time out. Can you run with the chart value set to |
The node
I'll increase the log value today, can you possibly link me to the controller code which does the check on the linstor side? |
That's the entry on the csi plugin side: https://github.com/piraeusdatastore/linstor-csi/blob/master/pkg/driver/driver.go#L613 And thats were the attach operation actually happens: https://github.com/piraeusdatastore/linstor-csi/blob/master/pkg/client/linstor.go#L430 |
I've recreated the pvs with the debug logs enabled, listing the resources is taking a long time:
After the pv's were attached this went down to about 9 seconds. All pods were stuck for 15m then created, we did a crictl prune and deleted a bunch of exited containers, this reduced the zfs datasets from over 1300 down to 658 reducing the amount of time it takes to do a zfs list by a second. The system we're creating will have thousands of datasets though (as each system will have thousands of containers and we're running ZFS on root) I've attached the logs of the linstor-csi-plugin from the time when the pvs were created. debug-logs.txt Analysing the logs shows the initial
Unfortunately there doesn't appear to be any information about why this is timing out. However we can deduce that the timeout is happening here: linstor-csi/pkg/driver/driver.go Line 653 in 4ee8dce
I believe the timeout may be occurring on this request: linstor-csi/pkg/client/linstor.go Line 436 in 4ee8dce
As it takes 5 seconds to get a response:
Strangely getting the resources and resource definitions via curl gives an empty reply, is this normal?
Here the resources view times out after 23 secs:
Here on the same resources view it times out after just 1 second:
In the case below the resource definition request times out after 1 minute:
|
Hmm, I'm not sure on the order of those log lines. It could be that there are multiple requests interleaved. We should probably start adding a request ID to the logs at some point. The empty replies are probably because of the missing client certificates for mTLS. I not sure why it would take LINSTOR that long to answer these requests. You said you have a lot of zfs datasets, but the actual ZVols for LINSTOR seem to be few in numbers. Maybe worth to ask on the linstor-server project directly. And maybe we can move away from the "view" method in that case, not sure if that would help |
The as an example, The link https://gitlab.at.linbit.com/linstor/linstor-server is not working for me, did you mean https://github.com/LINBIT/linstor-server? |
Yes, sorry, used the internal link there... |
I think we may be able to speed it up by implementing LINBIT/linstor-server#309 (comment) (I will test it). However we should also be able to tolerate a delay of a few seconds from the server. As it's possible in some cases a user could have thousands of volumes and snapshots used by linstor which could produce the same issue even with the speed up. Can you let me know where the timeout value is set on the http requests? I can try increasing the value to also check if that solves the issue. |
The timeout is created by the CSI sidecars. In this case I think it's here: https://github.com/kubernetes-csi/external-attacher/ ( |
These are hardcoded here: It looks like the sidecar container is set to 1m, but the timeout is actually happening in the So either |
Out of curiosity, did you provision your volumes before adding the HA parameters to your storage class? (E.g. by deleting it and then re-creating with new parameters.) This is something I just ran into -- none of my pods were getting handled correctly by HA because they were provisioned before I had added the HA params. I had to manually adjust their ResourceGroup to add the DRBD options. |
We don't have any HA parameters on the storage class, the latest version of https://github.com/piraeusdatastore/piraeus-ha-controller is opt-out rather than opt-in like the previous version. We can see that the pods are getting rescheduled on to the correct node properly, they're just not being attached due to a context timeout. |
@WanzenBug I tried to follow the calls up from linstor-csi/pkg/client/linstor.go Line 430 in 4ee8dce
ctx is set. It appears to be in the grpc code somewhere but it's not easy to find without a stack trace.
As you mentioned before, it's difficult to know which requests are timing out when due to there being no identifiers on the API calls. So it might be worth adding some additional debugging. Would you be able to run |
Sure. For completeness, I added some flags to curl to make it work with mTLS (the debug output in the logs does not handle that correctly), and also my deployment is named
|
Thanks @WanzenBug the command works with the mTLS certs included, although it took 10 seconds to complete (4 seconds on the subsequent cached requests). Surely 4-10 seconds should not cause a timeout?
Is this error always emitted by calling the cancel method on the ctx, or can it also be emitted from deadline exceeded/timeout? |
I think that is only emitted when the ctx is cancelled. That probably happens somewhere in the grpc server, when the RPC call itself is canceled/times out |
We are testing a system with 2 worker nodes and 1 further node acting as a tie-breaker.
This issue is twofold:
According to https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-automatic-promotion automatic promotion is now baked into drbd, however https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-feature-failover-clusters says this should be done with
and further https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-linstor_ha says that for HA
auto-promote
should be disabled.I can see
drbd-reactor
is running in thedrbd-prometheus-exporter
container on the satellites, but it doesn't seem to be configured with the promoter plug-in.The resources are defined as follows:
and the storage class is:
Satellites are configured with zfs pools as so:
So I'm not sure if
auto-promote
is enabled by default (the man page says it is), and if so how to configure it to promote faster?The text was updated successfully, but these errors were encountered: