Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35176: [C++] Add support for disabling threading for emscripten #35471

Closed
wants to merge 53 commits into from

Conversation

joemarshall
Copy link
Contributor

@joemarshall joemarshall commented May 7, 2023

As previously discussed in #35176 this is a patch that adds an option ARROW_DISABLE_THREADING. When it is turned on, arrow threadpool and serial executors don't spawn threads, and instead run tasks in the main thread when futures are waited for.

It doesn't mess with threading in projects included as dependencies, e.g. multithreaded malloc implementations because if you're building for a non threaded environment, you can't use those anyway.

Basically where this is at is that it runs the test suite okay, and I think should work well enough to be a backend for pandas on emscripten/pyodide, so subject to reviews etc. it is worth merging (and would be jolly convenient for me if it was).

What this means is:

  1. It is possible to use arrow in non-threaded emscripten/webassembly environments (with some build patches specific to emscripten which I'll put in once this is in)
  2. Most of arrow just works, albeit slower in parts.

Things that don't work and probably won't:

  1. Server stuff that relies on threads. Not a massive problem I think because environments with threading restrictions are currently typically also restricted from making servers anyway (i.e. they are web browsers)
  2. Anything that relies on actually doing two things at once (for obvious reasons)

Things that don't work yet and could be fixed in future:

  1. use of asynchronous file/network APIs in emscripten which would mean I/O could work efficiently in one thread.
  2. asofjoin - right now the implementation relies on std::thread - it needs refactoring to work with threadpool like everything else in arrow, but I'm not sure I am expert enough in the codebase to do it well.

This is useful for emscripten, where a lot of the time one doesn't have any threads. Could also be useful for any embedded ports etc.
@pitrou
Copy link
Member

pitrou commented May 9, 2023

Hmm, I've started reviewing this PR but it adds a lot of code in various delicate internals (Future, ThreadPool, TaskGroup...) that makes me uneasy. Do we value emscripten/webassembly compat so much that it warrants the cost in maintainability? Would there be another way to support emscripten/webassembly?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 10, 2023
removed debug file
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 10, 2023
@joemarshall
Copy link
Contributor Author

I think I've done the current review comments - so as i understand it I can run the test from tasks.yml by commenting in here like this?

@github-actions crossbow submit test-ubuntu-22.04-cpp-no-threading

@joemarshall
Copy link
Contributor Author

@github-actions crossbow submit test-ubuntu-22.04-cpp-no-threading

@github-actions
Copy link

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/4934990365

@pitrou
Copy link
Member

pitrou commented May 10, 2023

@github-actions crossbow submit test-ubuntu-22.04-cpp-no-threading

@github-actions
Copy link

Command '['git', '-C', '/tmp/tmplwp1r9ca/arrow', 'fetch', 'origin', 'pull/35471/head:main']' returned non-zero exit status 128.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/4935043997

@kou
Copy link
Member

kou commented May 10, 2023

@joemarshall Could you create a feature branch for this instead of using the main branch?
@github-actions crossbow submit test-ubuntu-22.04-cpp-no-threading doesn't work with the main branch.

@joemarshall
Copy link
Contributor Author

@kou A feature branch in my repo? Or does someone need to make a feature branch in the main repo which I can target the pull request against?

@kou
Copy link
Member

kou commented May 12, 2023

A feature branch in my repo?

Yes.

@joemarshall
Copy link
Contributor Author

This is now in #35672 which is the same but from a differently named branch because that means that the CI stuff (crossbow submit commands) actually work, whereas they don't work if the name of the source branch is main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add support for disablging threading for emscripten
4 participants