Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the support of the blocking migration for the cluster migrate command #1350

Closed
2 tasks done
ellutionist opened this issue Mar 24, 2023 · 17 comments · Fixed by #1418
Closed
2 tasks done

Add the support of the blocking migration for the cluster migrate command #1350

ellutionist opened this issue Mar 24, 2023 · 17 comments · Fixed by #1418
Assignees
Labels
feature type new feature

Comments

@ellutionist
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Motivation

For now the command to migrate slot clusterx migrate works asynchronously. It will return "OK" right after the migration has started. If the client wants to know the result of the migration, it has to constantly use "cluster info" to get the state.

However, once the migration has been completed, the target slot will be forbidden, and the client (cluster manager) has to actively change the topology as soon as possible, in order to make the cluster continue to work.

In summary, the current clusterx migrate has two shortcuts:

  1. The client cannot get the result soon enough.
  2. The "polling" approach to get the migration result is not efficient enough.

Solution

I propose to leave the old clusterx migrate command intact, and add a new clusterx bmirate command, which works like other blocking commands (e.g. blpop). The server will block the connection and return "OK" (if succeed) or other error message (if failed) to the client only after the migration has completed.

The command shall be like:

clusterx bmirate ${SLOT} ${TARGET_NODEID} ${TIMEOUT}

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@ellutionist ellutionist added the enhancement type enhancement label Mar 24, 2023
@xiaobiaozhao
Copy link
Contributor

#529

@xiaobiaozhao
Copy link
Contributor

I recommend that bmirate support bulk slot migration

@xiaobiaozhao xiaobiaozhao added feature type new feature and removed enhancement type enhancement labels Mar 24, 2023
@git-hulk
Copy link
Member

git-hulk commented Mar 24, 2023

Hi @ellutionist @xiaobiaozhao

How about adding the optional timeout for the current migrate command? For example:

  • timeout is missing or 0 means async migration which keep the same with the current behavior

  • timeout is -1(<0) means blocking until the process was finished

  • timeout is greater than 0 means the blocking time

@xiaobiaozhao
Copy link
Contributor

Hi @ellutionist @xiaobiaozhao

How about adding the optional timeout for the current migrate command? For example:

  • timeout is missing or 0 means async migration which keep the same with the current behavior
  • timeout is -1(<0) means blocking until the process was finished
  • timeout is greater than 0 means the blocking time

I agree.

@ellutionist
Copy link
Contributor Author

ellutionist commented Mar 26, 2023

Thanks to @git-hulk and @xiaobiaozhao . This is a good workaround to not create a new subcommand and to provide backward compatibility.

However, I have to mention this: the "OK" returned by migrate without timeout argument means "migration started". It differs from the "OK" returned by mirate with timeout which means "migration finished". Such inconsistency may cause confusion to our users, especially who is new to Kvrocks.

If this is acceptable, then I am ready to implement it.

@git-hulk
Copy link
Member

@ellutionist Thanks for your points. To be clear, we can also explicitly add the ASYNC|SYNC flag to identify the mode? And the timeout is only meaningful when the SYNC flag is exists.

How do you think?

@git-hulk
Copy link
Member

To see other folks have any comments. @xiaobiaozhao @PragmaTwice @torwig @ShooterIT @caipengbo

@git-hulk git-hulk changed the title Add blocking clusterx subcommand "bmigrate" Add the support of the blocking migration for the cluster migrate command Mar 26, 2023
@aleksraiden
Copy link
Contributor

As I think, an ASYNC|SYNC flag is best choice and will be using in some other commands, like BACKUP.

@torwig
Copy link
Contributor

torwig commented Mar 26, 2023

I respect the reasoning about introducing a separate command for a blocking version of the command: it's clearer, descriptive and consistent with some naming conventions of Redis (e. g. POP/BPOP).
On the other hand, if I use the blocking version of the MIGRATE command by specifying the proper options, it's obvious (at least for me) that it will block and the OK return value means "blocking operation is completed" (in other words, migration completed).
And one quick note: personally, I don't like commands with a bunch of options (three and more). They are harder to implement and test. So, maybe sometimes (but I'm not 100% sure that this case is the one), instead of introducing 2-3 additional options, we can create a new command.

@ellutionist
Copy link
Contributor Author

I agree with @torwig . A new subcommand is a more clear and more "redis" solution to me.

@git-hulk
Copy link
Member

Thanks, @ellutionist @torwig @xiaobiaozhao @aleksraiden.

So our consensus is to use bmigrate instead of adding the SYNC|ASYNC flag, it's also good to me.

@ShooterIT
Copy link
Member

I prefer to use SYNC|ASYNC flag instead of another command which has similar arguments check and logic.
in new redis version, they don't want to add more commands, such as flush command, it has ASYNC option.

For lpop and blpop, they operate data space, different commands have different flags so redis can distinguish which command can execute in scripts env easily.

@caipengbo
Copy link
Contributor

As I think, an ASYNC|SYNC flag is best choice and will be using in some other commands, like BACKUP.

Agree with it.

@LykxSassinator
Copy link
Contributor

I prefer to use SYNC|ASYNC flag instead of another command which has similar arguments check and logic. in new redis version, they don't want to add more commands, such as flush command, it has ASYNC option.

For lpop and blpop, they operate data space, different commands have different flags so redis can distinguish which command can execute in scripts env easily.

+1

@enjoy-binbin
Copy link
Member

vote for the SYNC|ASYNC flag

@ellutionist
Copy link
Contributor Author

Looks like the option of SYNC|ASYNC flag got more votes. If there is no further advice, I will implement the new feature as following:

clusterx migrate ${SLOT} ${NODE_ID} [SYNC|ASYNC] ${TIMEOUT}
  • If the ASYNC is given or the flag is missing, the command will act in the old way.
  • If the SYNC flag is given:
    • If the timeout is missing or "0", the command will block until the migration finish.
    • If the timeout comes as a positive integer, it means the max duration (of seconds) for which the command will block.

@PragmaTwice
Copy link
Member

I recommend that bmirate support bulk slot migration

I think it is unrelated to this PR. Besides, it is better to implement these two different features in two separate PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature type new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants