[Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171 #38280

rkooo567 · 2023-08-10T00:04:42Z

Why are these changes needed?

Before #37972, we ran the reporting & serilization output (in cpp) in a main thread while all the async actor tasks run in an async thread. However, after the PR, we now run both of them in an async thread.

This caused regression when there are decently large size (200~2KB) generator workloads (Aviary) because the serialization code was running with nogil. It means we could utilize real multi-threading because serialization code runs in a main thread, and async actor code runs in an async thread.

This PR fixes the issue by dispatching a cpp code (reporting & serialization) to a separate thread again. I also found when I used threadPoolExecutor, there were some circular dependencies issues where it leaks objects when exceptions happen. I realized this was due to the fact that Python exception captures the local references (thus there were some circular references). I refactored some part of code to avoid this from happening and added an unit test for that.

Related issue number

Closes #38163

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

rkooo567 · 2023-08-10T11:19:47Z

python/ray/_raylet.pyx

@@ -1012,7 +1013,7 @@ cdef class StreamingGeneratorExecutionContext:
        return self


-cpdef report_streaming_generator_output(
+cdef report_streaming_generator_output(


It doesn't have to be cpdef, and cdef is faster

rkooo567 · 2023-08-10T11:20:03Z

python/ray/_raylet.pyx

-    except StopIteration:
-        return True
-    except Exception as e:
+    if isinstance(output_or_exception, Exception):


if I raise an exception again like before, it caused the circular dependencies issues.

rkooo567 · 2023-08-10T11:20:16Z

python/ray/_raylet.pyx

+        # the output (which has nogil).
+        done = await loop.run_in_executor(
+            worker.core_worker.get_thread_pool_for_async_event_loop(),
+            report_streaming_generator_output,


this is the core fix (dispatching the cpp code to another thread)

rkooo567 · 2023-08-10T11:20:46Z

python/ray/_raylet.pyx

+    def get_thread_pool_for_async_event_loop(self):
+        if self.thread_pool_for_async_event_loop is None:
+            self.thread_pool_for_async_event_loop = ThreadPoolExecutor(
+                max_workers=int(os.getenv("RAY_ASYNC_THREAD_POOL_SIZE", 1)))


Q: maybe I should always use 1 thread? I used multiple for some exploration. I think when there are lots of async tasks, it may have some perf benefits

So it's thread safe to use multiple threads? Before it's done in main thread so it's 1.

Hmm actually a good point. I think it should be thread-safe, but not 100% sure. Maybe I will just use 1 thread for now

rkooo567 · 2023-08-13T08:23:57Z

FAiled tests seem unrelated

…le bug fix ray-project#38171 (ray-project#38280) Before ray-project#37972, we ran the reporting & serilization output (in cpp) in a main thread while all the async actor tasks run in an async thread. However, after the PR, we now run both of them in an async thread. This caused regression when there are decently large size (200~2KB) generator workloads (Aviary) because the serialization code was running with nogil. It means we could utilize real multi-threading because serialization code runs in a main thread, and async actor code runs in an async thread. This PR fixes the issue by dispatching a cpp code (reporting & serialization) to a separate thread again. I also found when I used threadPoolExecutor, there were some circular dependencies issues where it leaks objects when exceptions happen. I realized this was due to the fact that Python exception captures the local references (thus there were some circular references). I refactored some part of code to avoid this from happening and added an unit test for that. Signed-off-by: NripeshN <[email protected]>

…le bug fix ray-project#38171 (ray-project#38280) Before ray-project#37972, we ran the reporting & serilization output (in cpp) in a main thread while all the async actor tasks run in an async thread. However, after the PR, we now run both of them in an async thread. This caused regression when there are decently large size (200~2KB) generator workloads (Aviary) because the serialization code was running with nogil. It means we could utilize real multi-threading because serialization code runs in a main thread, and async actor code runs in an async thread. This PR fixes the issue by dispatching a cpp code (reporting & serialization) to a separate thread again. I also found when I used threadPoolExecutor, there were some circular dependencies issues where it leaks objects when exceptions happen. I realized this was due to the fact that Python exception captures the local references (thus there were some circular references). I refactored some part of code to avoid this from happening and added an unit test for that. Signed-off-by: e428265 <[email protected]>

…le bug fix ray-project#38171 (ray-project#38280) Before ray-project#37972, we ran the reporting & serilization output (in cpp) in a main thread while all the async actor tasks run in an async thread. However, after the PR, we now run both of them in an async thread. This caused regression when there are decently large size (200~2KB) generator workloads (Aviary) because the serialization code was running with nogil. It means we could utilize real multi-threading because serialization code runs in a main thread, and async actor code runs in an async thread. This PR fixes the issue by dispatching a cpp code (reporting & serialization) to a separate thread again. I also found when I used threadPoolExecutor, there were some circular dependencies issues where it leaks objects when exceptions happen. I realized this was due to the fact that Python exception captures the local references (thus there were some circular references). I refactored some part of code to avoid this from happening and added an unit test for that. Signed-off-by: Victor <[email protected]>

rkooo567 added 11 commits August 7, 2023 23:38

ip

cddfc65

op

e73f4ce

ip

739d9a4

working now.

ab8cae9

Merge branch 'master' into fix-perf-regression-streaimng-serve-handle

d983c0d

working now.

1ccbbcc

clean up.

e66283a

Merge branch 'master' into fix-perf-regression-streaimng-serve-handle

4786a02

Trial 2

d241350

ip

a109701

new approach that works.

848ec03

rkooo567 mentioned this pull request Aug 10, 2023

[Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171

Closed

8 tasks

rkooo567 changed the title ~~Different approach~~ [Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171 Aug 10, 2023

clean up.

3cb4ee8

rkooo567 commented Aug 10, 2023

View reviewed changes

rkooo567 assigned jjyao and edoakes Aug 10, 2023

rkooo567 added 2 commits August 11, 2023 07:00

Merge branch 'master' into different-approach

ab0ea24

ADdressed code review.

278ac52

jjyao approved these changes Aug 12, 2023

View reviewed changes

rkooo567 merged commit 60995d7 into ray-project:master Aug 13, 2023
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171 #38280

[Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171 #38280

rkooo567 commented Aug 10, 2023 •

edited

Loading

rkooo567 Aug 10, 2023

rkooo567 Aug 10, 2023

rkooo567 Aug 10, 2023

rkooo567 Aug 10, 2023

jjyao Aug 10, 2023

rkooo567 Aug 10, 2023

rkooo567 commented Aug 13, 2023

[Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171 #38280

[Core][Streaming Generator] Fix the perf regression from a serve handle bug fix #38171 #38280

Conversation

rkooo567 commented Aug 10, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

rkooo567 Aug 10, 2023

Choose a reason for hiding this comment

rkooo567 Aug 10, 2023

Choose a reason for hiding this comment

rkooo567 Aug 10, 2023

Choose a reason for hiding this comment

rkooo567 Aug 10, 2023

Choose a reason for hiding this comment

jjyao Aug 10, 2023

Choose a reason for hiding this comment

rkooo567 Aug 10, 2023

Choose a reason for hiding this comment

rkooo567 commented Aug 13, 2023

rkooo567 commented Aug 10, 2023 •

edited

Loading