Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the time consuming of split table more stable #22969

Closed
nolouch opened this issue Feb 26, 2021 · 0 comments
Closed

Make the time consuming of split table more stable #22969

nolouch opened this issue Feb 26, 2021 · 0 comments
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@nolouch
Copy link
Member

nolouch commented Feb 26, 2021

Development Task

Because split and scatter tasks are asynchronous, and currently the batch size is too small (after #18191)

const splitBatchRegionLimit = 16

which causes many split and scatter tasks at times, which have two problems:

  • at the same time will cause multiple conflicts on the same region, and then report epoch not match, we need more time to retry.
    image

  • it is very likely to cause a split from a scattering region, resulting in many 4 replicas regions, which will lead to the time-consuming of split table be unstable. We need to ensure that all regions that need to be broken up are split at once and then scatter at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant