Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the region balance of the 1M tables imported by lightning #8424

Open
HuSharp opened this issue Jul 22, 2024 · 3 comments
Open

Enhance the region balance of the 1M tables imported by lightning #8424

HuSharp opened this issue Jul 22, 2024 · 3 comments
Assignees
Labels
type/development The issue belongs to a development tasks

Comments

@HuSharp
Copy link
Member

HuSharp commented Jul 22, 2024

Development Task

Background

  • PD ensures that the table level's scatter, but it doesn't care about table-to-table, which relies on balance-scheduler.
  • Balance-Region will not schedule empty region, There is a hardcode in
    // isEmptyRegionAllowBalance returns true if the region is not empty or the number of regions is too small.
    func isEmptyRegionAllowBalance(cluster sche.SharedCluster, region *core.RegionInfo) bool {
    return region.GetApproximateSize() > core.EmptyRegionApproximateSize || cluster.GetTotalRegionCount() < core.InitClusterRegionThreshold
    }
  • When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247

Problems faced

For lightning importing 1 million tables(one table corresponds to one region), even though there are more than 3 stores, consecutive region keys will generate a lot of regions aggregations in the first 3 stores. And since regions are not scheduled, the three stores have a high probability of OOM.

img_v3_02cu_570929e1-9590-4330-8034-ed3a92434bfg

@HuSharp HuSharp added the type/development The issue belongs to a development tasks label Jul 22, 2024
@River2000i
Copy link

i'm interesting in the issue.

@River2000i
Copy link

/assign @River2000i

@River2000i
Copy link

River2000i commented Jul 30, 2024

When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247

PD schedule scatter region base on region_count in all store. And schedule new region to fewest store.

newPeer := r.selectNewPeer(context, group, peer, filters)

PD will compare region_count base on the group. For now, gourp define by table ID.(every table belong to a group)
targetLeader, leaderStorePickedCount := r.selectAvailableLeaderStore(group, region, leaderCandidateStores, r.ordinaryEngine)

Summary:

  1. If there are lots of new split empty regions(close kv range), the region will split on the same store. In group level, scatter base on the region_count in a group. It will not scatter region to the store , which not contain the region belongs to the group.
  2. In cluster level, scatter region base on the region_count in whole cluster. Region is balanced in whole cluster, but not balanced in group or table level.

root cause:
selectNewPeer() will pick origin store in first time trigger scatter region. Since it will compare all stores's region count in engineContext, but first time call context.selectedPeer.Get() will get 0 in all stores, so it will not scatter region in all stores.

func (r *RegionScatterer) selectNewPeer(context engineContext, group string, peer *metapb.Peer, filters []filter.Filter) *metapb.Peer {

If we want to schedule scatter region in cluster level, we can call ScatterRegion with the same group. It will be an options for caller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/development The issue belongs to a development tasks
Projects
None yet
Development

No branches or pull requests

2 participants