Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/design: Infer the System Timezone of a TiDB cluster via TZ environment variable #7656

Merged
merged 16 commits into from
Sep 11, 2018

Conversation

zhexuany
Copy link
Contributor

What problem does this PR solve?

This PR solves pushing System to TiKV sides which causes performance delegation.

What is changed and how it works?

doc specifies how this change work.

Check List

Tests

  • No code

Code changes

  • No code
    Side effects

Related changes

@zhexuany
Copy link
Contributor Author

@shenli @coocood


The positive side of this change is resolving performance delegation issue to avoid any potential time-related calculation.
The value of `localStr` should be initialized by method `initLocalStr()`at `TiDB`'s bootstrap stage. `localStr` will be stored in `mysql.tidb` under `system_tz` column. If system timezone can not get valid IANA timezone name from `mysql.tidb`, `UTC` will be used as default timezone name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no system_tz column, we just add a row with VARIABLE_NAME = 'system_tz'

docs/adding_tz_env.md Outdated Show resolved Hide resolved

## Abstract

When it comes to time-related calculation, it is hard for the distributed system. This proposal tries to resolve two problems: 1. timezone may be inconsistent across multiple `TiDB` instances, 2. performance degradation casued by pushing `System` down to `TiKV`. The impact of this proposal is changing the way of `TiDB` inferring system's timezone name. Before this proposal, the default timezone name pushed down to TiKV is `System` when session's timezone is not set. After this, TiDB evaluates system's timezone name via `TZ` environment variable and then pushes the real name down to `TiKV` rather than `System`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the global timezone is set to UTC and the session's timezone is not set explicitly, we will use UTC.

the default timezone name pushed down to TiKV is System when session's timezone is not set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/evaluates/infers/

@@ -0,0 +1,55 @@
# Proposal: adding TZ as an environment variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infer the System Timezone of a TiDB cluster via TZ environment variable


After we solved the daylight saving time issue, we found the performance degradation of TiKV side. Thanks for the investigation done by engineers from TiKV. The root cause of such performance degradation is that TiKV infers `System` timezone name via a third party lib, which calls a syscall and costs a lot. In our internal benchmark system, after [this PR](https://github.com/pingcap/tidb/pull/6823), our codebase is 1000 times slower than before. We have to address this.

Another problem needs also to be addressed is the potentially incosistent timezone name across multiple `TiDB` instances. `TiDB` instances may reside at different timezone which could produce incorrect calculation when it comes to time-related calculation. Just getting `TiDB`'s sytem timezone could be be broken. We need find a way to ensure the uniqueness of global timezone name across multiple `TiDB`'s timezone name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be be broken -> could be broken


In our case, which means both `TiDB` and `TiKV`, we need care the first and third formats. For answering why we do not need the second format, we need to review how Golang evaluates timezone. In `time` package, the method `LoadLocation` reads tzData from pre-specified sources(directories may contain tzData) and then builds `time. Location` from such tzData which already contains daylight saving time information.

In this proposal, we suggest setting `TZ` to a valid IANA timezone name which can be read from `TiDB` later. If `TiDB` can't get `TZ` or the supply of `TZ` is invalid, `TiDB` just falls back to evaluates the path of the soft link of `/etc/localtime`. In addition, an warning message telling the user you should set `TZ` properly will be printed. Setting `TZ` can be done in our `tidb-ansible` project, it is also can be done at user side by `export TZ="Asia/Shanghai"`. If both of them are failed, `TiDB` will use `UTC` as timezone name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to evaluates -> to evaluate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an warning -> a warning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both of them are failed, TiDB will use UTC as timezone name and print a warning log.

@zhexuany zhexuany changed the title docs: adding tz env docs: Infer the System Timezone of a TiDB cluster via TZ environment variable Sep 10, 2018

The std means the IANA timezone name; the offset means timezone offset; the dst indicates the leading timezone having daylight saving time.

In our case, which means both `TiDB` and `TiKV`, we need care the first and third formats. For answering why we do not need the second format, we need to review how Golang evaluates timezone. In `time` package, the method `LoadLocation` reads tzData from pre-specified sources(directories may contain tzData) and then builds `time. Location` from such tzData which already contains daylight saving time information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time. Location -> time.Location

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a link for LoadLocation?


In our case, which means both `TiDB` and `TiKV`, we need care the first and third formats. For answering why we do not need the second format, we need to review how Golang evaluates timezone. In `time` package, the method `LoadLocation` reads tzData from pre-specified sources(directories may contain tzData) and then builds `time. Location` from such tzData which already contains daylight saving time information.

In this proposal, we suggest setting `TZ` to a valid IANA timezone name which can be read from `TiDB` later. If `TiDB` can't get `TZ` or the supply of `TZ` is invalid, `TiDB` just falls back to evaluates the path of the soft link of `/etc/localtime`. In addition, an warning message telling the user you should set `TZ` properly will be printed. Setting `TZ` can be done in our `tidb-ansible` project, it is also can be done at user side by `export TZ="Asia/Shanghai"`. If both of them are failed, `TiDB` will use `UTC` as timezone name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which can be read from TiDB later -> which can be read by TiDB later


The positive side of this change is resolving performance degradation issue and ensuring the uniqueness of gloabl timezone name in multiple `TiDB` instances.

The negative side is just adding a config item which is a very small matter and the user probably does not care it if we can take care of it and more importantly guarantee the correctness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note here: We should mention and emphasise it in the document.

The implementation is relatively easy. We just get `TZ` environment from system and check whether it is valid or not. If it is invalid, TiDB evaluates the path of soft link of `/etc/localtime`.
In addition, a warning message needs to be printed indicating user has to set `TZ` variable. For example, if `/etc/localtime` links to /usr/share/zoneinfo/Asia/Shanghai, then timezone name should be `Asia/Shangahi`.

In order to ensure the uniqueness of global timezone across multiple `TiDB` instances, we need to write timezone name into `mysql.tidb` under column `system_tz`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When should we insert/update the value of system_tz ?

docs/adding_tz_env.md Outdated Show resolved Hide resolved
@zhexuany zhexuany changed the title docs: Infer the System Timezone of a TiDB cluster via TZ environment variable docs/design: Infer the System Timezone of a TiDB cluster via TZ environment variable Sep 11, 2018
@zhexuany zhexuany added component/docs priority/P1 The issue has P1 priority. priority/release-blocker This issue blocks a release. Please solve it ASAP. labels Sep 11, 2018
@shenli
Copy link
Member

shenli commented Sep 11, 2018

LGTM

@shenli
Copy link
Member

shenli commented Sep 11, 2018

@coocood PTAL

Copy link
Member

@coocood coocood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coocood coocood added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 11, 2018
@zhexuany zhexuany merged commit e0de9ca into pingcap:master Sep 11, 2018
@zhexuany zhexuany deleted the adding_a_design_doc branch September 11, 2018 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/docs priority/P1 The issue has P1 priority. priority/release-blocker This issue blocks a release. Please solve it ASAP. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants