-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis-cluster] Cluster init script hangs when external access is enabled in eks cluster using aws classic load balancers #16242
Comments
Having the same exact issue |
Sorry for the delay here, we are going to work on reproducing this issue to get more information. |
I just tried to reproduce the issue in a local environment using minikube and everything worked as expected. I waited for the IPs to be ready
And upgraded the deployment with those IPs
After that, pods became available
and confirmed that the logs
and the cluster info information is correct as well
Environment:
|
I am also having the same issue, when I test the connection to all pods from redis-0 it works, even using the IPs in the logs. |
As I mentioned above, I couldn't reproduce the issue in my environment. Could you please deploy the solution in a different environment, ensure you deploy the latest version of the solution and your environment uses a stable version of all the components included in the cluster? |
Hey - I just want to make sure that everybody realizes that the issue is happening in an eks cluster. The behavior is very specific to Amazon/AWS and their handling of load balancers. Comparing against a minikube cluster is not an apples-to-apples comparison. |
Correct, it seems to be a specific problem with the Amazon/AWS cluster networking configuration. We will try to reproduce the issue in the same environment to try to get more information but in the meantime, could you try to debug the issue on your side? You can try to connect to the different nodes from one of the pods and confirm what the issue may be. You can use the
If the domain is resolved properly but the solution can't simply reach the other node, I think we should contact AWS to know more about what's happening with the cluster. |
AWS support was engaged prior to submitting this issue. This is a summary of their findings:
As part of our troubleshooting, we were able to:
Therefore, I don't believe it to be an issue with DNS resolution or network connectivity. |
If domains and IPs are accessible and they are resolved correctly, we need to confirm you can connect to the different nodes using the |
I was able to use
After some digging, it appears that there are a few existing issues relating to problems with using hostnames:
Are you sure this functionality works for you in EKS clusters? I'm curious what steps were taken in order to get this to succeed? I am able to reproduce this consistently using the steps in my initial post. |
Here is our new workaround.
- |
# Backwards compatibility change
if ! [[ -f /opt/bitnami/redis/etc/redis.conf ]]; then
cp /opt/bitnami/redis/etc/redis-default.conf /opt/bitnami/redis/etc/redis.conf
fi
pod_index=($(echo "$POD_NAME" | tr "-" "\n"))
pod_index="${pod_index[-1]}"
hosts=($(echo "{{ .Values.cluster.externalAccess.service.loadBalancerIP }}" | cut -d [ -f2 | cut -d ] -f 1))
ip=$(getent hosts ${hosts[$pod_index]} | awk '{ print $1 }')
export REDIS_CLUSTER_ANNOUNCE_IP="${ip}"
export REDIS_CLUSTER_ANNOUNCE_HOSTNAME="${hosts[$pod_index]}"
export REDIS_CLUSTER_PREFERRED_ENDPOINT_TYPE=hostname
export REDIS_NODES="${hosts[@]}"
{{- if .Values.cluster.init }}
rm -rf /bitnami/redis/data
if [[ "$pod_index" == "0" ]]; then
export REDIS_CLUSTER_CREATOR="yes"
export REDIS_CLUSTER_REPLICAS="{{ .Values.cluster.replicas }}"
fi
{{- end }}
/opt/bitnami/scripts/redis-cluster/entrypoint.sh /opt/bitnami/scripts/redis-cluster/run.sh
|
Thank you @yo-ga. Did you finally open an issue in the official Redis repo so that they can modify it from their side and be able to obtain the changes in new releases? |
Thank you both. Would you like to open a PR to apply that changes @yo-ga? We will happy to review it! |
Hi @corico44 , the above is the workaround way. Also, I would need to confirm that all the actions to be covered, including adding nodes, fail over. It would take some time if you can wait for the patch. |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Thank you for submitting the associated Pull Request. Our team will review and provide feedback. Once the PR is merged, the issue will automatically close. Your contribution is greatly appreciated! |
HI, I am facing the same issue. |
Name and Version
bitnami/redis-cluster 8.4.4
What architecture are you using?
None
What steps will reproduce the bug?
I've been following this guide to enable external access to redis-cluster in an amazon eks cluster: https://docs.bitnami.com/tutorials/deploy-redis-cluster-tmc-helm-chart/#step-5-deploy-the-bitnami-redis-cluster-helm-chart-by-enabling-external-access
The first step completes successfully and I am able to see that the load balancers are created with ports 6379 and 16379 exposed:
After adding the load balancer addresses to the
cluster.externalAccess.service.loadbalancerIP
array and performing the helm upgrade, the statefulset is created and after some time, all 6 nodes appear to come up and report healthy:However, after further inspection of the logs on pod
-0
it appears that the cluster init script is hanging on the following message:In addition,
cluster info
reports that the cluster status isfail
and that not all 6 nodes have joined successfully:I have engaged aws cloud support to rule out connectivity issues, and we were able to successfully telnet into all of the loadbalancers on port
6379
as well as16379
from a test pod within the k8s cluster.Are you using any custom parameters or values?
What is the expected behavior?
The cluster boots up successfully and reports that the cluster status is ok. All 6 nodes are joined to the cluster and external clients are able to connect, auth, read and write keys.
What do you see instead?
Cluster init script hangs on the message:
And
cluster info
reports that the cluster status isfail
. Not all 6 nodes have joined successfully:Additional information
EKS cluster k8s version - 1.24.10
AWS load balancer controller version - 2.4.5
The text was updated successfully, but these errors were encountered: