Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine the load balance strategy of choosing TiFlash replica to read #1807

Open
leiysky opened this issue Apr 22, 2021 · 1 comment · Fixed by pingcap/tidb#26130
Open

Refine the load balance strategy of choosing TiFlash replica to read #1807

leiysky opened this issue Apr 22, 2021 · 1 comment · Fixed by pingcap/tidb#26130
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@leiysky
Copy link
Contributor

leiysky commented Apr 22, 2021

Background

In TiDB, there isn't a proper way to choose follower peer or learner peer to read, which may cause hotpoint of read request.

Priviously, we used a random approach to choose TiFlash peer which can lead to balance in terms of probability.

While TiDB doesn't have ability to tell if a pure learner store(i.e. TiFlash) is down or not. Therefore if there is a TiFlash node get crashed during a query, TiDB may tend to read from the crashed node again when doing backoff which will cause a long time wait in MPP query due to some design issues. pingcap/tidb#23589 fixed this problem in a brutal way and introduce the hotpoint issue in MPP mode.

We should find a way to solve both the load balance issue and the backoff issue.

Related work

PD is planning to design a mechanism to collect information about load(e.g. read flow, QPS, etc.) of follower peers to help schedule hotpoint introduced by stale read. They demand TiDB to choose follower peer randomly.

A keep alive mechanism is needed for doing backoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant