Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support] Add parallel_for support to run a loop in parallel #6275

Merged
merged 11 commits into from
Aug 18, 2020

Conversation

jcf94
Copy link
Contributor

@jcf94 jcf94 commented Aug 14, 2020

As has mentioned in #5962.
We would like a runtime implementation of parallel loop to speed up some thread safe loops.

Take some API reference from https://docs.microsoft.com/en-us/cpp/parallel/concrt/parallel-algorithms?view=vs-2019#parallel_for. I've tried to add an implementation of parallel_for_each as well, but finding that the const reference style of tvm::Array seems not very compatible with this API.

cc @tqchen @merrymercy @FrozenGene @junrushao1994

@jroesch
Copy link
Member

jroesch commented Aug 14, 2020

cc @tkonolige

@tkonolige
Copy link
Contributor

I think I'm a little late to this discussion, but what is the reason for having our own thread pool/parallel_for implementation? OpenMP is already an optional dependency, we could use it when available.

I can understand that having our own implementation means we don't have to depend on another library, but it also means we need to maintain it and add features as we need them (though this doesn't seem like much code).

@tqchen
Copy link
Member

tqchen commented Aug 14, 2020

@jcf94 it would be great if we can simplify the parallel for implementation, e.g. std::thread has pretty low launching overhead, and we can likely drop the threadpool and start std::thread on each parallel_for round. As long as the function cost is not high, it should be a simpler implementation.

@tqchen tqchen self-assigned this Aug 14, 2020
@tqchen tqchen added the status: need update need update based on feedbacks label Aug 14, 2020
@jcf94
Copy link
Contributor Author

jcf94 commented Aug 15, 2020

@jcf94 it would be great if we can simplify the parallel for implementation, e.g. std::thread has pretty low launching overhead, and we can likely drop the threadpool and start std::thread on each parallel_for round. As long as the function cost is not high, it should be a simpler implementation.

I've tried a simpler implementation using std::thread, have a look if this is better. 😄

@tqchen
Copy link
Member

tqchen commented Aug 15, 2020

Thanks @jcf94 please look into the CI issue. Also please do a benchmark to see if the new implementation will/will not affect perf, since it is only fine for larger functions and we might still need pool for very fine grained parallelism

src/support/parallel_for.cc Show resolved Hide resolved
src/support/parallel_for.cc Outdated Show resolved Hide resolved
src/support/parallel_for.cc Outdated Show resolved Hide resolved
@jcf94
Copy link
Contributor Author

jcf94 commented Aug 17, 2020

Thanks @jcf94 please look into the CI issue. Also please do a benchmark to see if the new implementation will/will not affect perf, since it is only fine for larger functions and we might still need pool for very fine grained parallelism

Had some simple benchmark on the two implementations, the current one even works better in large loop size(since each threads' workload is pre-defined by the partitioner).
Currently Ansor's requirement on the parallel_for is just used on some thread independent tasks, this simple implementation is enough to work well. I'm not sure if some tasks with fine grained parallelism will benefit on a pool implementation.

@jcf94
Copy link
Contributor Author

jcf94 commented Aug 17, 2020

@merrymercy @tqchen I'm thinking that do we need a interface to pass the parapllel_for to Python? In my understanding, the python multi-threading may have some performance issues caused by GIL.

@FrozenGene
Copy link
Member

@merrymercy @tqchen I'm thinking that do we need a interface to pass the parapllel_for to Python? In my understanding, the python multi-threading may have some performance issues caused by GIL.

for this, multi processing maybe enough...

merrymercy
merrymercy previously approved these changes Aug 18, 2020
@merrymercy merrymercy dismissed their stale review August 18, 2020 09:04

one final comment unaddressed

@tqchen tqchen merged commit c1d347f into apache:master Aug 18, 2020
@tqchen
Copy link
Member

tqchen commented Aug 18, 2020

Thanks @jcf94 @merrymercy @jroesch @ @tkonolige !

@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Aug 18, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
@jcf94 jcf94 deleted the parallel_for branch August 28, 2020 06:37
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants