Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747

Merged
merged 1 commit into from
Feb 3, 2020

Conversation

FrozenGene
Copy link
Member

Follow up #4344 . Previous PR solves OpenCV + TVM slow performance and AutoTVM Python CPU affinity problem. However, previous PR has two problem:

  • doesn't solve ARM BIG.LITTLE heterogeneous multicores.
    For example, we have 2xA72 + 4xA53, previous pr will add all cpu to CPU SET even we call config_thread_pool to restrict TVM runs on 4 little cores. So, TVM_MASTER_THREAD could run A72 big core. This is not we want.
    Solution: we should restrict master in the little cpus or big cpus according to users's setting.
  • Not all Linux variant OS has pthread_atfork. For example, Alibaba's AliOS doesn't have this api.
    Solution: unified android and linux's way. We will call SetFullCpuAffinity if we don't have TVM_BIND_MASTER_THREAD.

@tqchen @yidawang @vinx13 @ajtulloch Could you help to review it? Thanks.

@u99127
Copy link
Contributor

u99127 commented Jan 20, 2020

Follow up #4344 . Previous PR solves OpenCV + TVM slow performance and AutoTVM Python CPU affinity problem. However, previous PR has two problem:

* doesn't solve ARM  BIG.LITTLE heterogeneous multicores.
  For example, we have 2xA72 + 4xA53, previous pr will add all cpu to CPU SET even we call `config_thread_pool` to restrict TVM runs on 4 little cores. So, TVM_MASTER_THREAD could run A72 big core. This is not we want.
  Solution: we should restrict master in the little cpus or big cpus according to users's setting.

This approach needs to be added as comments in the code explaining how this is expected to work. Further I suspect it is TVM_BIND_MASTER_THREAD that you refer to above ? What is the role of the TVM master thread ? Is it another thread that does compute or is it a "controlling" thread ?

* Not all Linux variant OS has `pthread_atfork`. For example, Alibaba's AliOS doesn't have this api.

pthread_atfork IIRC originally comes from the posix group from Issue 5 as documented here. Checking the other libcs suggest that all other mainstream Linux runtimes like glibc, musl and bionic support this and thus are we working around a limitation of AliOS here ?

  Solution: unified android and linux's way. We will call `SetFullCpuAffinity` if we don't have `TVM_BIND_MASTER_THREAD`.

@tqchen @yidawang @vinx13 @ajtulloch Could you help to review it? Thanks.

@FrozenGene
Copy link
Member Author

FrozenGene commented Jan 20, 2020

This approach needs to be added as comments in the code explaining how this is expected to work. Further I suspect it is TVM_BIND_MASTER_THREAD that you refer to above ? What is the role of the TVM master thread ? Is it another thread that does compute or is it a "controlling" thread ?

TVM_BIND_MASTER_THREAD is related with exclude_worker0 , the default value of exclude_worker0 is false. https://github.com/apache/incubator-tvm/blob/master/include/tvm/runtime/threading_backend.h#L50. As the comment said, if it is false, we will make main thread compute worker0 and worker0 will not be launched in a new thread. Normally main thread doesn't have much to do. If TVM_BIND_MASTER_THREAD env var is set, we will bind the main thread be cpu core of sorted_order_[0] or sorted_order_[-1] (if we are in the little mode)

So previous solution is to let us not bind the master thread to be one specific core so that our master thread could run any cores. But previous solution omit one situation that ARM CPU has BIG.LITTLE, we can not add all cpus to CPU SET. It will result in main thread could run big core (even users specify tvm run little cores). Also could run little cores (even users specify tvm run big cores).

This PR solves this by restrict tvm master thread's cpu set. That is to say: if we are in the little mode, we should restrict master thread run only little cores. if we are in the big mode, we should restrict master thread run only big cores. Note: This could work well on x86 cpu too. Because x86 cpu doesn't have BIG.Little. our code implementation will use big_count_ to calculate the cpu cores by default. So master thread could run any cores.

I will add comment to explain a bit in code.

pthread_atfork IIRC originally comes from the posix group from Issue 5 as documented here. Checking the other libcs suggest that all other mainstream Linux runtimes like glibc, musl and bionic support this and thus are we working around a limitation of AliOS here ?

AliOS like some versions of Android, doesn't provide pthread_atfork. However, the key to solve this problem is doesn't restrict master thread bind to one specific core. So PR's solution could work fine on Linux / Android too. pthread_atfork is not a key or must.

@FrozenGene FrozenGene force-pushed the thread_fix branch 2 times, most recently from d8bbfb8 to 3cee66e Compare January 20, 2020 11:22
@tqchen
Copy link
Member

tqchen commented Jan 20, 2020

Overall I like the idea of binding master thread to the set of cores needed rather than a single one, whichi means free migration of master thread, would be great if we can document this idea in the code, let others to check.

Given that pthread_atfork is indeed available in most cases, we should put that as a separate item(perhaps using macro to detect the existence) if we still decides to use it.

For example, we can keep the existing behavior of binding master thread on linux, but use the new behavior on android

@FrozenGene
Copy link
Member Author

Overall I like the idea of binding master thread to the set of cores needed rather than a single one, whichi means free migration of master thread, would be great if we can document this idea in the code, let others to check.

Newest code has provided one detail example to show which cores the master thread could run in the comment. Could check it and to see whether to have anything should supply.

Given that pthread_atfork is indeed available in most cases, we should put that as a separate item(perhaps using macro to detect the existence) if we still decides to use it.
For example, we can keep the existing behavior of binding master thread on linux, but use the new behavior on android

pthread_atfork is not a must for solving this problem in fact. The key is we should let master thread could do free migration on available cores. Previous pr wants to do but provide cpu cores set wrong as described. Existing behavior of linux is still has bug if we deploy it on one arm (has big little cpus) linux embed board. So we should resolve linux / android together in my opinion. :-)

src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
@yidawang
Copy link
Contributor

FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305

@FrozenGene
Copy link
Member Author

FrozenGene commented Jan 25, 2020 via email

@yidawang
Copy link
Contributor

To clarify your meaning, as hyper thread will make the big_count_ is twice times compared with physical cores. For example, phisical cores are 4, hyper thread will make the concurrency number be 8 (big_count_ is 8 too). So for hyper thread, we wish master thread could run the physical cores (id: 0-3), not cpu id 4-7. 

------------------ Original ------------------ From: Yida Wang <[email protected]> Date: Sat,Jan 25,2020 3:21 AM To: apache/incubator-tvm <[email protected]> Cc: Zhao Wu <[email protected]>, Author <[email protected]> Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747) FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Yes

@FrozenGene
Copy link
Member Author

FrozenGene commented Jan 25, 2020 via email

@FrozenGene FrozenGene force-pushed the thread_fix branch 2 times, most recently from 3249215 to f2e54c6 Compare February 3, 2020 02:50
@FrozenGene
Copy link
Member Author

@yidawang Would you like to review it again? Thanks.

Copy link
Contributor

@yidawang yidawang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor naming/explanatory issues. Other than that, I am good to sign off.

src/runtime/threading_backend.cc Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
@FrozenGene FrozenGene force-pushed the thread_fix branch 2 times, most recently from d2368f9 to 38a4b53 Compare February 3, 2020 08:04
Copy link
Contributor

@yidawang yidawang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In addition, it would be better if you don't squash locally to leave only one commit every time in the PR. Having multiple commits would be helpful for reviewers to track your incremental change. Thanks!

@tqchen tqchen merged commit bb1b7db into apache:master Feb 3, 2020
@tqchen
Copy link
Member

tqchen commented Feb 3, 2020

Thanks @FrozenGene @yidawang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants