-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35176: [C++] Add support for disabling threading for emscripten #35672
Conversation
This is useful for emscripten, where a lot of the time one doesn't have any threads. Could also be useful for any embedded ports etc.
@kou - I've done those tidying up fixes (use of auto, extraneous return statements etc.) Those two test cases that you flagged, I checked through them again, and I think I was wrong - they probably don't really make sense in single-threaded mode as they're testing essentially multithreaded behaviour. I've just skipped those tests in single threaded mode. The description at the top of the PR is up to date now (and has a brief changelog of what changed from initial PR), so hopefully good to go. thanks |
Could you replace the initial description directly instead of adding a new "Updated since initial PR:" section? |
Okay, it's up to date. |
@kou think we're good to merge now? |
@github-actions crossbow submit test-ubuntu-22.04-cpp-no-threading |
Revision: a4bfd80 Submitted crossbow builds: ursacomputing/crossbow @ actions-0e4eae872c
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
The failure is #37067 .
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b668595. There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ten (apache#35672) As previously discussed in apache#35176 this is a patch that adds an option `ARROW_ENABLE_THREADING`. When it is turned off, arrow threadpool and serial executors don't spawn threads, and instead run tasks in the main thread when futures are waited for. It doesn't mess with threading in projects included as dependencies, e.g. multithreaded malloc implementations because if you're building for a non threaded environment, you can't use those anyway. Basically where this is at is that it runs the test suite okay, and I think should work well enough to be a backend for pandas on emscripten/pyodide. What this means is: 1) It is possible to use arrow in non-threaded emscripten/webassembly environments (with some build patches specific to emscripten which I'll put in once this is in) 2) Most of arrow just works, albeit slower in parts. Things that don't work and probably won't: 1) Server stuff that relies on threads. Not a massive problem I think because environments with threading restrictions are currently typically also restricted from making servers anyway (i.e. they are web browsers) 2) Anything that relies on actually doing two things at once (for obvious reasons) Things that don't work yet and could be fixed in future: 1) use of asynchronous file/network APIs in emscripten which would mean I/O could work efficiently in one thread. 2) asofjoin - right now the implementation relies on std::thread - it needs refactoring to work with threadpool like everything else in arrow, but I'm not sure I am expert enough in the codebase to do it well. * Closes: apache#35176 Lead-authored-by: Joe Marshall <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Weston Pace <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
As previously discussed in #35176 this is a patch that adds an option
ARROW_ENABLE_THREADING
. When it is turned off, arrow threadpool and serial executors don't spawn threads, and instead run tasks in the main thread when futures are waited for.It doesn't mess with threading in projects included as dependencies, e.g. multithreaded malloc implementations because if you're building for a non threaded environment, you can't use those anyway.
Basically where this is at is that it runs the test suite okay, and I think should work well enough to be a backend for pandas on emscripten/pyodide.
What this means is:
Things that don't work and probably won't:
Things that don't work yet and could be fixed in future: