Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task-pooling based parallelization & incremental output #252

Open
Profpatsch opened this issue Aug 18, 2017 · 3 comments
Open

Task-pooling based parallelization & incremental output #252

Profpatsch opened this issue Aug 18, 2017 · 3 comments

Comments

@Profpatsch
Copy link
Contributor

When scheduling longer tasks (like audio file conversion), all cores should be used by the script.
Right now Turtle only contains parallel, which will schedule as many parallel processes as fork allows and only return the results after every element has been processed.

I’d like to use something like async-pool, here’s a first try:

  finished <- liftIO $ Pool.withTaskGroup 4 $ \tg ->
    Pool.mapTasks tg $ map action infiles
  select finished >>= printf s

It uses exactly four cores now, but the output is still deferred until the very end; I’m not sure if it’s possible to integrate that into the streaming abstraction?

@Profpatsch Profpatsch changed the title Task-Pooling-based parallelization & incremental output Task-pooling based parallelization & incremental output Aug 18, 2017
@Gabriella439
Copy link
Owner

To be precise, parallel doesn't exactly block until every element has been processed. For example, if the first element completes first, then parallel will emit it immediately and not wait for the remaining elements to complete. The only restriction is that parallel returns the elements in order, which means that parallel cannot return the second element before the first element.

The reason I'm explaining that is to point out that if you want something that produces results earlier than that then you need to drop the requirement to preserve the original order. Usually the way you do this is to have each action store the result in a buffer and then you stream the results from the buffer.

Going back to your original question: you can get the task-based approach to stream incrementally using the same trick. Have each task store its result in a buffer and then stream the results from the buffer.

@Profpatsch
Copy link
Contributor Author

Profpatsch commented Aug 31, 2017

Ah, I understand how parallel can only produce one output per item; one is kind of trapped in IO anyway, since withAsync enforces it.

What would you use for the buffer? A TChan from stm?
Then if one would like pooling, it would have to be implemented with the Async.Pool withAsync anyway.

@Gabriella439
Copy link
Owner

Yeah, it could be any buffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants