Task-pooling based parallelization & incremental output #252

Profpatsch · 2017-08-18T10:22:05Z

When scheduling longer tasks (like audio file conversion), all cores should be used by the script.
Right now Turtle only contains parallel, which will schedule as many parallel processes as fork allows and only return the results after every element has been processed.

I’d like to use something like async-pool, here’s a first try:

  finished <- liftIO $ Pool.withTaskGroup 4 $ \tg ->
    Pool.mapTasks tg $ map action infiles
  select finished >>= printf s

It uses exactly four cores now, but the output is still deferred until the very end; I’m not sure if it’s possible to integrate that into the streaming abstraction?

The text was updated successfully, but these errors were encountered:

Gabriella439 · 2017-08-19T14:48:52Z

To be precise, parallel doesn't exactly block until every element has been processed. For example, if the first element completes first, then parallel will emit it immediately and not wait for the remaining elements to complete. The only restriction is that parallel returns the elements in order, which means that parallel cannot return the second element before the first element.

The reason I'm explaining that is to point out that if you want something that produces results earlier than that then you need to drop the requirement to preserve the original order. Usually the way you do this is to have each action store the result in a buffer and then you stream the results from the buffer.

Going back to your original question: you can get the task-based approach to stream incrementally using the same trick. Have each task store its result in a buffer and then stream the results from the buffer.

Profpatsch · 2017-08-31T00:01:42Z

Ah, I understand how parallel can only produce one output per item; one is kind of trapped in IO anyway, since withAsync enforces it.

What would you use for the buffer? A TChan from stm?
Then if one would like pooling, it would have to be implemented with the Async.Pool withAsync anyway.

Gabriella439 · 2017-09-01T03:14:57Z

Yeah, it could be any buffer

Profpatsch changed the title ~~Task-Pooling-based parallelization & incremental output~~ Task-pooling based parallelization & incremental output Aug 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task-pooling based parallelization & incremental output #252

Task-pooling based parallelization & incremental output #252

Profpatsch commented Aug 18, 2017

Gabriella439 commented Aug 19, 2017

Profpatsch commented Aug 31, 2017 •

edited

Loading

Gabriella439 commented Sep 1, 2017

Task-pooling based parallelization & incremental output #252

Task-pooling based parallelization & incremental output #252

Comments

Profpatsch commented Aug 18, 2017

Gabriella439 commented Aug 19, 2017

Profpatsch commented Aug 31, 2017 • edited Loading

Gabriella439 commented Sep 1, 2017

Profpatsch commented Aug 31, 2017 •

edited

Loading