Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35176: [C++] Add support for disabling threading for emscripten #35672

Merged
merged 93 commits into from
Aug 9, 2023

Conversation

joemarshall
Copy link
Contributor

@joemarshall joemarshall commented May 18, 2023

As previously discussed in #35176 this is a patch that adds an option ARROW_ENABLE_THREADING. When it is turned off, arrow threadpool and serial executors don't spawn threads, and instead run tasks in the main thread when futures are waited for.

It doesn't mess with threading in projects included as dependencies, e.g. multithreaded malloc implementations because if you're building for a non threaded environment, you can't use those anyway.

Basically where this is at is that it runs the test suite okay, and I think should work well enough to be a backend for pandas on emscripten/pyodide.

What this means is:

  1. It is possible to use arrow in non-threaded emscripten/webassembly environments (with some build patches specific to emscripten which I'll put in once this is in)
  2. Most of arrow just works, albeit slower in parts.

Things that don't work and probably won't:

  1. Server stuff that relies on threads. Not a massive problem I think because environments with threading restrictions are currently typically also restricted from making servers anyway (i.e. they are web browsers)
  2. Anything that relies on actually doing two things at once (for obvious reasons)

Things that don't work yet and could be fixed in future:

  1. use of asynchronous file/network APIs in emscripten which would mean I/O could work efficiently in one thread.
  2. asofjoin - right now the implementation relies on std::thread - it needs refactoring to work with threadpool like everything else in arrow, but I'm not sure I am expert enough in the codebase to do it well.

This is useful for emscripten, where a lot of the time one doesn't have any threads. Could also be useful for any embedded ports etc.
@github-actions github-actions bot added Component: Parquet awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 2, 2023
@joemarshall
Copy link
Contributor Author

@kou - I've done those tidying up fixes (use of auto, extraneous return statements etc.)

Those two test cases that you flagged, I checked through them again, and I think I was wrong - they probably don't really make sense in single-threaded mode as they're testing essentially multithreaded behaviour. I've just skipped those tests in single threaded mode.

The description at the top of the PR is up to date now (and has a brief changelog of what changed from initial PR), so hopefully good to go.

thanks
Joe

@kou
Copy link
Member

kou commented Aug 3, 2023

The description at the top of the PR is up to date now (and has a brief changelog of what changed from initial PR)

Could you replace the initial description directly instead of adding a new "Updated since initial PR:" section?
The latest decision is the most important thing for us.
If we want to see the history, we can use GitHub's history view UI. (See the "edited" pull down in the top of the description area.)

@joemarshall
Copy link
Contributor Author

The description at the top of the PR is up to date now (and has a brief changelog of what changed from initial PR)

Could you replace the initial description directly instead of adding a new "Updated since initial PR:" section? The latest decision is the most important thing for us. If we want to see the history, we can use GitHub's history view UI. (See the "edited" pull down in the top of the description area.)

Okay, it's up to date.

@joemarshall
Copy link
Contributor Author

@kou think we're good to merge now?

@kou
Copy link
Member

kou commented Aug 9, 2023

@github-actions crossbow submit test-ubuntu-22.04-cpp-no-threading

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Revision: a4bfd80

Submitted crossbow builds: ursacomputing/crossbow @ actions-0e4eae872c

Task Status
test-ubuntu-22.04-cpp-no-threading Github Actions

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Aug 9, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The failure is #37067 .

@kou kou merged commit b668595 into apache:main Aug 9, 2023
35 of 36 checks passed
@kou kou removed the awaiting change review Awaiting change review label Aug 9, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Aug 9, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b668595.

There were 2 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ten (apache#35672)

As previously discussed in apache#35176 this is a patch that adds an option `ARROW_ENABLE_THREADING`. When it is turned off, arrow threadpool and serial executors don't spawn threads, and instead run tasks in the main thread when futures are waited for.

It doesn't mess with threading in projects included as dependencies, e.g. multithreaded malloc implementations because if you're building for a non threaded environment, you can't use those anyway.

Basically where this is at is that it runs the test suite okay, and I think should work well enough to be a backend for pandas on emscripten/pyodide.

What this means is:
1) It is possible to use arrow in non-threaded emscripten/webassembly environments (with some build patches specific to emscripten which I'll put in once this is in)
2) Most of arrow just works, albeit slower in parts.

Things that don't work and probably won't:
1) Server stuff that relies on threads. Not a massive problem I think because environments with threading restrictions are currently typically also restricted from making servers anyway (i.e. they are web browsers)
2) Anything that relies on actually doing two things at once (for obvious reasons)

Things that don't work yet and could be fixed in future:
1) use of asynchronous file/network APIs in emscripten which would mean I/O could work efficiently in one thread.
2) asofjoin - right now the implementation relies on std::thread - it needs refactoring to work with threadpool like everything else in arrow, but I'm not sure I am expert enough in the codebase to do it well.
* Closes: apache#35176

Lead-authored-by: Joe Marshall <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
trxcllnt added a commit that referenced this pull request Jan 25, 2024
This PR ensures `-DARROW_ENABLE_THREADING` (added in #35672) is defined in the compile strings.

Fixes #39619
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add support for disablging threading for emscripten
5 participants