Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support adaptive cluster topology refreshing and static refresh sources #240

Closed
mp911de opened this issue May 12, 2016 · 14 comments
Closed
Labels
type: feature A new feature
Milestone

Comments

@mp911de
Copy link
Collaborator

mp911de commented May 12, 2016

Original Author: [email protected]
Original post: https://groups.google.com/forum/#!topic/lettuce-redis-client-users/jdJrR0PqJOg

We have been using lettuce for a while now, and it always works like a charm, but when we are trying to scale Redis, we seem to be running into some issues.

Below is the scenario.

We have the following.
70 master node cluster
one slave per master, ie 70 masters + 70 slaves
~60 application servers which push data in the redis nodes.
The 60 application servers are expected to increase to 100.

However, if such a configuration change happens, it is not detected by lettuce, unless we refresh the cluster topology by calling ClusterClient.reloadPartitions() or start the background job using ClusterClientOptions.
Considering, the cluster has a number of nodes, there is a possibility that any node may temporarily fail, and slave may become a master, so we decided to run the built in job using ClusterClientOptions

We run the job using ClusterClientOptions configured every 60 secs, but considering there are 140 total nodes, the command "cluster nodes" takes around 30 ms(it is the slowest command, the other commands are in microseconds), while it was taking 100 microseconds for a 30 node cluster.
Also, ClusterTopologyRefreshTask hits each of the 140 nodes for refresh.

Considering, all 60 servers hit 140 nodes every 60 seconds, each redis node gets an average of 1 cluster client request per second, which takes 30 millisecs.
That means we are losing 3% of the time in the "cluster nodes" command.

We are planning to double the size of the redis cluster, and the application servers will also increase.
The former will lead to even slow "cluster nodes" command, and the latter will lead to more than 1 "cluster nodes" command per second on each node.
The combined effect will definitely have a bad impact on redis considering "cluster nodes" command is already around 1000 times slower than the other commands.

We tried to dig in the redis code, and found that when the ClusterTopologyRefreshTask tries to refresh its partitions, it executes "cluster nodes" command on all the partitions every 60 secs, in our case it is 140 nodes using TimedAsyncCommand.

One solution which we felt is that if it could iterate over the set of RedisURIs, make a sync call to get the "cluster nodes", and if any call was successful, use it to refresh the partitions, leading to only one random redis being queried for the cluster nodes, which will solve our case.

The above issue is only noticeable when the cluster size is very large because that is when the "cluster nodes" commands becomes very slow.

@mp911de
Copy link
Collaborator Author

mp911de commented May 12, 2016

Original Author: [email protected]

I think "Ask/Moved" redirections, or a timeout exception from the node which was supposed to serve a particular slot, are fair indicators that the topology has changed(slots are moved, or a node is down), so that the "cluster slots" or "cluster nodes" command can be used for getting the info to rebuild the cache.

Regarding which node should be used for getting the info, the users should give a list of redis URIs without which the cluster will not work, eg. if we have 5 masters and 10 slaves, the users should provide the IPs and ports of 1 master and its 2 slaves, because atleast one of them should always be working if the cluster is up(although there is a partial cluster use case where the cluster does not fail even though some of its slots are not served). Giving only one redis uri is anyways problematic, because if users give a single URI and depend on auto discovery of other nodes, it is possible that the redis node failed but its slave was still up, leading to the cluster running in good state, but when the application server was restarted, it will only try to connect to the failed node.

May be, so, the onus should be on the application developer to give the correct set of nodes.
If they give too few nodes, there are chances that the cluster will not be discovered on application restart.
If they give too many nodes, we should choose the first node and use "cluster nodes" from it, if that node gives cluster state as ok in "cluster info" command.
If in the worst case, the cluster is split, and there are two separate running clusters, (split brain scenario), whether ok or not, all bets are off, and may be we should return the first one.

@mp911de
Copy link
Collaborator Author

mp911de commented May 12, 2016

Putting the user in charge of a stable topology discovery sounds key to me. The user is the only one who can reliably say, which parts of the cluster are some "core" components. So defining the seed nodes as the source for the topology is a good idea. Combined with signals from the client, if a disconnect persists for a couple of seconds, this could then trigger the topology refresh using the specified seed nodes.

@mp911de mp911de added this to the Lettuce 4.2 milestone May 12, 2016
@mp911de
Copy link
Collaborator Author

mp911de commented May 13, 2016

@rahulbabbar The topology refresh interval can be adjusted, see Cluster topology refresh period

@RamirezTuco
Copy link

@mp911de yes. we are avoiding to increase the refresh period as even a time of 5 min would mean we would lose out on the data for 5 min for the huge cluster..only if the refreshes were event based(moved/ask/timeout), we will probably not need refresh tasks.

@mp911de mp911de added the type: feature A new feature label May 19, 2016
@mp911de
Copy link
Collaborator Author

mp911de commented May 25, 2016

@rahulbabbar I deployed 4.2-SNAPSHOT to Sonatype OSS containing adaptive topology refreshing.

You'd configure it like

RedisClusterClient redisClient = RedisClusterClient.create("redis://password@localhost:7379");
ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = new ClusterTopologyRefreshOptions.Builder()
        .enabled(true)
        .enableAllAdaptiveRefreshTriggers()
        .refreshPeriod(30, TimeUnit.MINUTES)
        .build();

ClusterClientOptions clusterClientOptions = new ClusterClientOptions.Builder()
        .topologyRefreshOptions(clusterTopologyRefreshOptions)
        .build();

redisClient.setOptions(clusterClientOptions);

Would be great if you could have a look and give it a try.

@RamirezTuco
Copy link

Hi @mp911de

Just tested and everything seems to be working great. checked that the cluster view was auto refreshed when the following happened on the redis server.

  1. slave was promoted to a master
  2. slots were moved to other existing nodes.
  3. new nodes added and slots were moved to them.

Also, tested by setting dynamicRefreshSources to false, and it works like a charm.

Just a couple of observations

  1. If the cluster state changes, and a lettuce async connection is used, it returns ok for the first call(following which the cluster view refreshes), but the value is not updated in redis.
  2. In ClusterTopologyRefreshOptions, it is not possible to have the cluster view refresh by enabling adaptive refresh triggers if the refresh period is not given.

I think we can definitely live with #1(replaying all the commands particularly on a cluster with high write load can be tough), and applications can easily overcome #2 by specifying a much longer refresh period.

Eagerly awaiting the 4.2 release.

Thanks a lot.

@mp911de
Copy link
Collaborator Author

mp911de commented May 26, 2016

First of all, thanks a lot for giving the change a try. I'm not sure I understood the first point about returning OK but not writing to Redis. I'd like to understand that issue, maybe there's something I could do about.

I also thought about the second issue. Periodic topology updates and adaptive updates are linked right now and I wasn't too happy with this either, so going to change and decouple periodic and adaptive updates.

@RamirezTuco
Copy link

Hi Mark,
Lets say there is a cluster with 3 master and 3 slaves, and a running application with lettuce is connected to it. At time t1, the master m1 goes down, prompting s1 to become master. However the application is unaware of it. Now lets say the application hits redis's m1 node using Lettuce's async connection and updates a value. Considering it is async, the application does not wait for it, and gives ok to the user.
However since the hit to redis timesout, prompting a topology refresh in lettuce, which succeeds.
Now, the value which the application tried to update in redis(which prompted the topology refresh) is not reflected in redis. If it is an issue, one way to fix it is to replay the command.

But i think we can definitely live with it, even redis does not guarantee 100% write success in case of master failure and a slave taking over(in case a write does not propagate to a slave which became the master). Also it is a very small price to pay for the speed which async API provides.

@mp911de
Copy link
Collaborator Author

mp911de commented May 27, 2016

@rahulbabbar Thanks for the details. So you're basically using fire&forget in that case which brings its benefits and pitfalls. This issue is no blocker for the topology update so I'm going to merge the feature into master.

mp911de added a commit that referenced this issue May 27, 2016
lettuce now supports adaptive topology refreshing with dynamic and static topology sources.
Dynamic topology sources mean that lettuce will connect to the discovered nodes (once obtaining the topology from the seed nodes) and query the discovered nodes for the topology. Dynamic sources is the default behavior, and it remains unchanged. Static sources use only the initial seed nodes as topology sources. Static topology sources reduce the number of topology queries but also retrieves the connected client count only for the seed nodes.

Adaptive cluster topology refreshing listens to cluster runtime events such as MOVED/ASK redirections and multiple reconnection attempts. Adaptive triggers perform an early topology refresh as soon as a trigger is activated. Adaptive topology refreshing is protected by a rate-limiting timeout as events can occur at a large scale if a cluster node is down or reconfigured.

Periodic topology refreshing and adaptive topology refreshing can be enabled/disabled independently from each other.
@mp911de
Copy link
Collaborator Author

mp911de commented May 27, 2016

Done.

@mp911de mp911de closed this as completed May 27, 2016
@mp911de mp911de changed the title Scaling issues with 70 - 200 Redis Cluster nodes Support adaptive cluster topology refreshing and static refresh sources May 27, 2016
@RamirezTuco
Copy link

Thanks

@RamirezTuco
Copy link

Hi @mp911de
Tentatively when do you plan to release 4.2.0?

@mp911de
Copy link
Collaborator Author

mp911de commented May 30, 2016

I have two tickets left for 4.2.0 (#253 and #252) which I need to solve. If no major bug pops up I'd expect to release lettuce 4.2.0 next week.

@steowens
Copy link

steowens commented Nov 3, 2016

Evidently there is tribal knowledge which is not obvious from reading your documentation. If you have a 70 node cluster how does a URI which connects to localhost magically figure out where this cluster lives or what nodes to try to form a connection with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

No branches or pull requests

3 participants