[Core][ObjectRef] Change default to not record call stack during ObjectRef creation #18078

mwtian · 2021-08-25T15:53:29Z

Why are these changes needed?

Recording call stack is 40~50% of CPU overhead for ObjectRef creation in Python (profile). I wonder if we should make the feature default to false, and ask user to enable it before running ray memory. We can also use sampling or passing call site constants to optimize this. Another more limited change is to cache the call site just for contained object refs (#17882).

On m5.8xlarge, before this change:

single client get calls per second 35931.41 +- 209.84
single client put calls per second 35649.72 +- 84.53
multi client put calls per second 181739.57 +- 1654.46
single client get calls (Plasma Store) per second 10330.02 +- 211.09
single client put calls (Plasma Store) per second 6384.16 +- 23.45
multi client put calls (Plasma Store) per second 8295.33 +- 88.54
single client put gigabytes per second 20.78 +- 8.12
multi client put gigabytes per second 31.96 +- 3.78
single client get object containing 10k refs per second 6.53 +- 0.01
single client tasks sync per second 2002.61 +- 20.43
single client tasks async per second 14710.71 +- 284.48
multi client tasks async per second 38890.68 +- 500.38
1:1 actor calls sync per second 3297.3 +- 89.64
1:1 actor calls async per second 7181.34 +- 19.17
1:1 actor calls concurrent per second 6352.96 +- 41.57
1:n actor calls async per second 19833.41 +- 71.79
n:n actor calls async per second 46055.91 +- 6647.68
n:n actor calls with arg async per second 7162.26 +- 41.88
1:1 async-actor calls sync per second 2219.81 +- 61.12
1:1 async-actor calls async per second 4645.55 +- 51.89
1:1 async-actor calls with args async per second 3074.96 +- 65.11
1:n async-actor calls async per second 16257.97 +- 88.73
n:n async-actor calls async per second 29373.27 +- 2132.39

After this change:

single client get calls per second 34963.26 +- 116.83
single client put calls per second 62930.59 +- 451.86
multi client put calls per second 238268.13 +- 2668.02
single client get calls (Plasma Store) per second 11528.71 +- 59.86
single client put calls (Plasma Store) per second 6817.38 +- 151.37
multi client put calls (Plasma Store) per second 8280.51 +- 56.15
single client put gigabytes per second 20.92 +- 6.2
multi client put gigabytes per second 32.56 +- 3.88
single client get object containing 10k refs per second 16.7 +- 0.02
single client tasks sync per second 2052.0 +- 64.46
single client tasks async per second 18896.4 +- 110.65
multi client tasks async per second 38944.39 +- 540.41
1:1 actor calls sync per second 3417.26 +- 31.38
1:1 actor calls async per second 7113.32 +- 94.16
1:1 actor calls concurrent per second 6380.13 +- 64.04
1:n actor calls async per second 20064.44 +- 117.48
n:n actor calls async per second 46639.69 +- 6856.08
n:n actor calls with arg async per second 6974.33 +- 25.07
1:1 async-actor calls sync per second 2283.84 +- 29.82
1:1 async-actor calls async per second 4619.38 +- 107.56
1:1 async-actor calls with args async per second 3100.14 +- 44.4
1:n async-actor calls async per second 17291.48 +- 175.33
n:n async-actor calls async per second 29663.23 +- 1660.87

single client put calls and single client get object containing 10k refs show 80~150% increase in ops/sec.

Related issue number

#17803

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ericl · 2021-08-25T18:21:28Z

Can we add a message in ray memory telling users to set RAY_record_ref_creation_sites=1 for additional debugging information? Maybe we should group these under a RAY_DEBUG flag in general once @stephanie-wang adds her metadata extras too.

In general, we need to make sure user instructions are exposed in the right messages.

mwtian · 2021-08-25T20:42:52Z

Can we add a message in ray memory telling users to set RAY_record_ref_creation_sites=1 for additional debugging information? Maybe we should group these under a RAY_DEBUG flag in general once @stephanie-wang adds her metadata extras too.

In general, we need to make sure user instructions are exposed in the right messages.

Makes sense. Will add the necessary info.

…us: 2021-08-25 17:10:41.494492 ======== Grouping by node address... Sorting by object size... Display allentries per group... --- Summary for node address: 192.168.0.234 --- Mem Used by Objects Local References Pinned Pending Tasks Captured in Objects Actor Handles 0.0 B 1, (-1.0 B) 0, (0.0 B) 0, (0.0 B) 0, (0.0 B) 0, (0.0 B) --- Object references for node address: 192.168.0.234 --- IP Address | PID | Type | Call Site | Size | Reference Type | Object Ref 192.168.0.234 | 27804 | Driver | | ? | LOCAL_REFERENCE | a67dc375e60ddd1affffffffffffffffffffffff0100000001000000 To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1 --- Aggregate object store stats across all nodes --- Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed message

mwtian · 2021-08-26T00:15:06Z

Added a message to ray memory along with other fixes.
Before:

======== Object references status: 2021-08-25 16:39:23.421086 ========
Grouping by node address...        Sorting by object size...        Display allentries per group...


--- Summary for node address: total_actor_handles ---
Mem Used by Objects  Local References  Pinned        Pending Tasks  Captured in Objects  Actor Handles
0.0 B                1, (-1.0 B)       0, (0.0 B)    0, (0.0 B)     0, (0.0 B)           0, (0.0 B)

--- Object references for node address: total_actor_handles ---
IP Address    PID    Type    Call Site               Size    Reference Type      Object Ref
192.168.0.234  20905  Driver  (task call)  | /Users/  ?       LOCAL_REFERENCE     a67dc375e60ddd1affffffffffffffffffffffff0100000001000000
                             mwtian/work/hello/hell
                             o-cluster.py:<module>:
                             11



--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed

After:

======== Object references status: 2021-08-25 17:08:28.892803 ========
Grouping by node address...        Sorting by object size...        Display allentries per group...


--- Summary for node address: 192.168.0.234 ---
Mem Used by Objects  Local References  Pinned        Pending Tasks  Captured in Objects  Actor Handles
0.0 B                1, (-1.0 B)       0, (0.0 B)    0, (0.0 B)     0, (0.0 B)           0, (0.0 B)

--- Object references for node address: 192.168.0.234 ---
IP Address       PID    Type    Call Site               Size    Reference Type      Object Ref
192.168.0.234    27804  Driver  ?                       ?       LOCAL_REFERENCE     a67dc375e60ddd1affffffffffffffffffffffff0100000001000000

To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1

--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed

ericl · 2021-08-26T00:37:55Z

Looks good. Could we change the "?" to "disabled" for a bit of increased clarity?

ericl

You will also likely need to fix the test suite that checks the "ray memory" output.

dashboard/memory_utils.py

Co-authored-by: Eric Liang <[email protected]>

mwtian · 2021-08-27T00:51:22Z

You will also likely need to fix the test suite that checks the "ray memory" output.

test_memstat is fixed now.

ericl · 2021-08-27T17:53:49Z

test_memstat seems to be failing in CI ^^

mwtian · 2021-08-27T21:52:11Z

Looks like test_memstat is fixed this time (at least on Linux).

done

ericl · 2021-08-27T22:45:37Z

Merged, thanks!

mwtian added 2 commits August 25, 2021 08:17

Disable recording call site for ObjectRef creations

a3a1bd9

benchmark

635bd1c

mwtian requested review from ericl and stephanie-wang August 25, 2021 16:02

mwtian assigned ericl and stephanie-wang Aug 25, 2021

mwtian marked this pull request as ready for review August 25, 2021 16:03

mwtian mentioned this pull request Aug 25, 2021

[Core] Slow Embedded Object Ref Counting #17803

Closed

2 tasks

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 25, 2021

mwtian added 2 commits August 25, 2021 17:10

fix

c87839a

mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 26, 2021

ericl previously requested changes Aug 26, 2021

View reviewed changes

dashboard/memory_utils.py Outdated Show resolved Hide resolved

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 26, 2021

Update ray memory output

dfe6bfe

Co-authored-by: Eric Liang <[email protected]>

mwtian force-pushed the callsite branch from d5f0bcc to b04d201 Compare August 27, 2021 00:49

mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 27, 2021

mwtian requested a review from ericl August 27, 2021 00:50

fix test_memstat

669cb9c

mwtian force-pushed the callsite branch from b04d201 to 669cb9c Compare August 27, 2021 01:42

fix all tests in

5f5e301

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 27, 2021

fix #2

8fd7b77

mwtian force-pushed the callsite branch from cd2aeca to 8fd7b77 Compare August 27, 2021 19:05

mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 27, 2021

ericl approved these changes Aug 27, 2021

View reviewed changes

ericl merged commit 26679d6 into ray-project:master Aug 27, 2021

ericl mentioned this pull request Aug 27, 2021

[core] Attach call site to ObjectRefs, print on error #17971

Merged

6 tasks

mwtian deleted the callsite branch August 28, 2021 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core][ObjectRef] Change default to not record call stack during ObjectRef creation #18078

[Core][ObjectRef] Change default to not record call stack during ObjectRef creation #18078

mwtian commented Aug 25, 2021 •

edited

Loading

ericl commented Aug 25, 2021

mwtian commented Aug 25, 2021 •

edited

Loading

mwtian commented Aug 26, 2021

ericl commented Aug 26, 2021

ericl left a comment

mwtian commented Aug 27, 2021

ericl commented Aug 27, 2021

mwtian commented Aug 27, 2021

ericl commented Aug 27, 2021

[Core][ObjectRef] Change default to not record call stack during ObjectRef creation #18078

[Core][ObjectRef] Change default to not record call stack during ObjectRef creation #18078

Conversation

mwtian commented Aug 25, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

ericl commented Aug 25, 2021

mwtian commented Aug 25, 2021 • edited Loading

mwtian commented Aug 26, 2021

ericl commented Aug 26, 2021

ericl left a comment

Choose a reason for hiding this comment

mwtian commented Aug 27, 2021

ericl commented Aug 27, 2021

mwtian commented Aug 27, 2021

ericl commented Aug 27, 2021

mwtian commented Aug 25, 2021 •

edited

Loading

mwtian commented Aug 25, 2021 •

edited

Loading