Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No optimization for Scala parallelized collections #25

Open
ochafik opened this issue Mar 16, 2015 · 4 comments
Open

No optimization for Scala parallelized collections #25

ochafik opened this issue Mar 16, 2015 · 4 comments

Comments

@ochafik
Copy link
Member

ochafik commented Mar 16, 2015

From @fdietze on November 10, 2011 16:21

It seems like there is no optimization for the parallelized collections in Scala.

This is optimized:
(0 until 1000).map

While this is not:
(0 until 1000).par.map

Whats the easiest way to get the parallelized collections optimized? The CL-Collections?

Thanks for this great compiler plugin. It helped me a lot speeding up my existing project.

Copied from original issue: nativelibs4java/nativelibs4java#199

@ochafik
Copy link
Member Author

ochafik commented Mar 16, 2015

Hi fdietze,

Thanks for your feedback !

The plugin optimizes code that leaves room for optimization. This is the case of Range, where a rewrite into while loops can speed things up a lot. With parallel collections though, it is not clear how to make the code run faster, since rewriting the calls into while loops is no longer an (easy) option.
ScalaCL collections can indeed provide some acceleration, but with some trade-offs : less operations are actually supported in an efficient way, and data copies to and from the collections can be very costly and should be done with care.

What kind of optimization do you have in mind ?

Cheers

@ochafik
Copy link
Member Author

ochafik commented Mar 16, 2015

From @fdietze on November 11, 2011 0:6

Hi ochafik,

thanks for your answer.

I'm thinking about something similar of what OpenMP does. Because we have loops with a fixed number of iterations, we can split the range in chunks of size (iterations/#cpus) and run them independently with different threads and while loops. But I don't know if thats as trivial as the other transformations are...

@ochafik
Copy link
Member Author

ochafik commented Mar 16, 2015

Hi fdietze,

This seems indeed far from being trivial, especially without hints to the compiler (and my guess is that the overall gain, if any, would not justify the work).
I'm afraid I do not have the resources to explore this path currently, but feel free to explore it and give suggestions / status report within this issue report.

Cheers

@ochafik
Copy link
Member Author

ochafik commented Mar 16, 2015

For the record, here's a document that explains how the OpenMP parallel loops work & look like :
http://bisqwit.iki.fi/story/howto/openmp/#LoopDirectiveFor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant