[bitnami/redis-cluster] update job fails to get new node IP address #3876

tpolekhin · 2020-10-02T18:33:54Z

Which chart:
redis-cluster-3.2.4.tgz

Describe the bug
wait_for_dns_lookup bash function from https://github.com/bitnami/bitnami-docker-redis-cluster/blob/master/6.0/debian-10/prebuildfs/opt/bitnami/scripts/libnet.sh#L33 failing to return a correct IP address of new pod

To Reproduce

helm install redis-cluster bitnami/redis-cluster --set 'cluster.replicas=0'
helm upgrade redis-cluster bitnami/redis-cluster --set 'cluster.nodes=6,cluster.replicas=0,cluster.init=false,cluster.update.addNodes=true,cluster.update.currentNumberOfNodes=3'
tail the logs of the cluster-update pod

Expected behavior
All new pods discovered and added to the existing cluster

Version of Helm and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.3.3", GitCommit:"55e3ca022e40fe200fbc855938995f40b2a68ce0", GitTreeState:"dirty", GoVersion:"go1.15.2"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T23:30:39Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.12+IKS", GitCommit:"d09005b98837bb6061c0f643a27383c02b003205", GitTreeState:"clean", BuildDate:"2020-09-16T21:47:16Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Additional context
I've added set -x to the update job script to trace the issue
here's output of the pod/redis-cluster-cluster-update-9x4nt
you can see that dns lookup worked once to break out of the loop here
but next dns_lookup that's used to return IP address fails to find IP address

++ sleep 5
++ (( i+=1  ))
++ (( i <= retries  ))
++ check_host redis-cluster-3.redis-cluster-headless
+++ dns_lookup redis-cluster-3.redis-cluster-headless
+++ local host=redis-cluster-3.redis-cluster-headless
+++ getent ahosts redis-cluster-3.redis-cluster-headless
+++ awk '/STREAM/ {print $1 }'
++ [[ 172.21.166.44 == '' ]]
++ true
++ return_value=0
++ break
++ return 0
++ dns_lookup redis-cluster-3.redis-cluster-headless
++ local host=redis-cluster-3.redis-cluster-headless
++ getent ahosts redis-cluster-3.redis-cluster-headless
++ awk '/STREAM/ {print $1 }'
+ new_node_ip=
++ redis-cli -h '' -p 6379 ping
Could not connect to Redis at :6379: Name or service not known
+ [[ '' != \P\O\N\G ]]
+ echo 'Node  not ready, waiting for all the nodes to be ready...'
+ sleep 5

The text was updated successfully, but these errors were encountered:

EswarRams · 2020-10-02T19:02:59Z

I also faced the same issue when I try to do an upgrade. I do see another issue as well.
Did you try with the 3 master and 4 slaves by any chance? When I restart the master I see the slave become master, but new pod which came up with the new ip didn't join the cluster. Pod can restart any time so it should join back the cluster. Is that something you faced or you didn't try that yet?

javsalgar · 2020-10-05T09:59:33Z

Hi,

@rafariossaa is checking an issue with the upgrade, pinging him in case it is related.

rafariossaa · 2020-10-05T10:32:11Z

Hi,
I am looking into this. I will come back as soon as I have news.

tpolekhin · 2020-10-06T09:39:52Z

@EswarRams I did tried the default setup with 3 masters and 3 followers and everything was okay for me. I deleted a master pod and follower got promoted. When old master came back it joined the cluster and became a follower for new master.
Make sure you're running with persistence enabled, otherwise the config is lost when you kill a pod and it will not join the cluster. Possibly an issue? I don't know what was the intent of the developers here. Since there's an environment variable with all the cluster pods DNS names one would hope that PVC is not that important, but who knows.
Maybe devs can comment on this

rafariossaa · 2020-10-07T07:21:51Z

Hi,
@EswarRams issue seems to be different than this that is related to the deployment upgrade.
@tpolekhin , joining the cluster should not be related to persistence. As you indicated, nodes names are set to the pod in a environment var.
Please, if you are experimenting a different issue open a new issue and link the issues you think are related.

tpolekhin · 2020-10-07T11:25:47Z

@rafariossaa I've created a new issue as you suggested #3933

rafariossaa · 2020-10-08T07:15:32Z

@tpolekhin Thanks .

EswarRams · 2020-10-08T13:50:11Z

Thanks for your response. I haven't tried yet. I will be trying with persistent enabled and see if that pod restart works or not. Hope there is another thread followed for adding or removing the node from the cluster as master or slave.

…

On Thu, Oct 8, 2020 at 2:15 AM Rafael Ríos Saavedra < ***@***.***> wrote: @tpolekhin <https://github.com/tpolekhin> Thanks . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3876 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARGEXTQDDX3JYLJ4H43DDLTSJVRKHANCNFSM4SCAS4LA> .

stale · 2020-10-24T03:22:25Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

tpolekhin · 2020-10-26T12:25:53Z

Hello,
I can see this issue received a stale label, is there anyone still looking at this, or you weren't able to reproduce the issue?

rafariossaa · 2020-10-27T08:15:03Z

Hi,
Sorry for the delay.
For one part, a new release of the chart fixing some issues to decide the role of the node when restarting (or scaling) was made. So, maybe it worth it to give it a try and check if the issue persists.
Also, I was waiting some feedback from @EswarRams .

tpolekhin · 2020-11-02T13:04:37Z

Hello @rafariossaa
I'm currently stuck with another issue on cluster upgrade #4064
Hopefully it will resolve this issue as well when fixed

rafariossaa · 2020-11-03T10:58:20Z

Hi,
Please, let me know when #4064 is solved and you can continue with your upgrade.

stale · 2020-11-21T03:14:40Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

stale · 2020-11-29T11:58:16Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

FraPazGal mentioned this issue Oct 5, 2020

How to create a 8 node redis cluster with 4 master and 4 slaves? It is not creating 4 shards #3816

Closed

allen-servedio mentioned this issue Oct 7, 2020

[bitnami/redis-cluster]: pod without pvc don't rejoin cluster after restart #3933

Closed

stale bot added the stale 15 days without activity label Oct 24, 2020

stale bot removed the stale 15 days without activity label Oct 26, 2020

stale bot added the stale 15 days without activity label Nov 21, 2020

stale bot closed this as completed Nov 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/redis-cluster] update job fails to get new node IP address #3876

[bitnami/redis-cluster] update job fails to get new node IP address #3876

tpolekhin commented Oct 2, 2020

EswarRams commented Oct 2, 2020

javsalgar commented Oct 5, 2020

rafariossaa commented Oct 5, 2020

tpolekhin commented Oct 6, 2020

rafariossaa commented Oct 7, 2020

tpolekhin commented Oct 7, 2020

rafariossaa commented Oct 8, 2020

EswarRams commented Oct 8, 2020 via email

stale bot commented Oct 24, 2020

tpolekhin commented Oct 26, 2020

rafariossaa commented Oct 27, 2020

tpolekhin commented Nov 2, 2020

rafariossaa commented Nov 3, 2020

stale bot commented Nov 21, 2020

stale bot commented Nov 29, 2020

[bitnami/redis-cluster] update job fails to get new node IP address #3876

[bitnami/redis-cluster] update job fails to get new node IP address #3876

Comments

tpolekhin commented Oct 2, 2020

EswarRams commented Oct 2, 2020

javsalgar commented Oct 5, 2020

rafariossaa commented Oct 5, 2020

tpolekhin commented Oct 6, 2020

rafariossaa commented Oct 7, 2020

tpolekhin commented Oct 7, 2020

rafariossaa commented Oct 8, 2020

EswarRams commented Oct 8, 2020 via email

stale bot commented Oct 24, 2020

tpolekhin commented Oct 26, 2020

rafariossaa commented Oct 27, 2020

tpolekhin commented Nov 2, 2020

rafariossaa commented Nov 3, 2020

stale bot commented Nov 21, 2020

stale bot commented Nov 29, 2020