Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some possible ne functionalities #1

Open
aeftef opened this issue Nov 15, 2019 · 1 comment
Open

Some possible ne functionalities #1

aeftef opened this issue Nov 15, 2019 · 1 comment

Comments

@aeftef
Copy link

aeftef commented Nov 15, 2019

Great tool, congratulations and thanks for sharing.

Just a couple of improvement suggestions as a open discussion.

  • Support shard swapping parallelism, at least have parallel relocations on different nodes(not having more than 1 active relocation on each node, but several concurrent swaps on the cluster)
  • Cluster routing relocation awareness support: if this funcitonallity is configured on a cluster, shard movement is restricted by "zones". The relocation plans generated by the tool could detect this configuration and plan accordingly. Its quite similar to the AZ functionality...
  • Consider shard swapping involving more than 2 shard movements: sometimes shard distribution is in a suboptimal distribution and no more replica-primary can be swapped, but swapping a replica-replica can change the cluster state allowing more extra replica-primary swaps towards a better balance. (maybe that means in the game model, to consider 2 depths of movements and allow replica-replica swaps).
@hallh
Copy link
Owner

hallh commented Nov 27, 2019

Glad it can be of use :)

I'll just go over them point-by-point:

  1. I intend to do this. Spot Instance interruptions are making us run the balancing a few times a month, and it's pain to wait for it to complete.
  2. Yeah, this would be fairly simple to include alongside the AZ mapping. Could consider it.
  3. I did this at first, but the space of valid moves became so big that the MCTS algo simply needed to much time to run simulations for it to be worth it. There are probably optimisations that could be made, but cost/benefit just made me cut it entirely. Would accept a PR for it though.

I'll keep this open for the first two points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants