Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][ObjectRef] Change default to not record call stack during ObjectRef creation #18078

Merged
merged 8 commits into from
Aug 27, 2021

Conversation

mwtian
Copy link
Member

@mwtian mwtian commented Aug 25, 2021

Why are these changes needed?

Recording call stack is 40~50% of CPU overhead for ObjectRef creation in Python (profile). I wonder if we should make the feature default to false, and ask user to enable it before running ray memory. We can also use sampling or passing call site constants to optimize this. Another more limited change is to cache the call site just for contained object refs (#17882).

On m5.8xlarge, before this change:

single client get calls per second 35931.41 +- 209.84
single client put calls per second 35649.72 +- 84.53
multi client put calls per second 181739.57 +- 1654.46
single client get calls (Plasma Store) per second 10330.02 +- 211.09
single client put calls (Plasma Store) per second 6384.16 +- 23.45
multi client put calls (Plasma Store) per second 8295.33 +- 88.54
single client put gigabytes per second 20.78 +- 8.12
multi client put gigabytes per second 31.96 +- 3.78
single client get object containing 10k refs per second 6.53 +- 0.01
single client tasks sync per second 2002.61 +- 20.43
single client tasks async per second 14710.71 +- 284.48
multi client tasks async per second 38890.68 +- 500.38
1:1 actor calls sync per second 3297.3 +- 89.64
1:1 actor calls async per second 7181.34 +- 19.17
1:1 actor calls concurrent per second 6352.96 +- 41.57
1:n actor calls async per second 19833.41 +- 71.79
n:n actor calls async per second 46055.91 +- 6647.68
n:n actor calls with arg async per second 7162.26 +- 41.88
1:1 async-actor calls sync per second 2219.81 +- 61.12
1:1 async-actor calls async per second 4645.55 +- 51.89
1:1 async-actor calls with args async per second 3074.96 +- 65.11
1:n async-actor calls async per second 16257.97 +- 88.73
n:n async-actor calls async per second 29373.27 +- 2132.39

After this change:

single client get calls per second 34963.26 +- 116.83
single client put calls per second 62930.59 +- 451.86
multi client put calls per second 238268.13 +- 2668.02
single client get calls (Plasma Store) per second 11528.71 +- 59.86
single client put calls (Plasma Store) per second 6817.38 +- 151.37
multi client put calls (Plasma Store) per second 8280.51 +- 56.15
single client put gigabytes per second 20.92 +- 6.2
multi client put gigabytes per second 32.56 +- 3.88
single client get object containing 10k refs per second 16.7 +- 0.02
single client tasks sync per second 2052.0 +- 64.46
single client tasks async per second 18896.4 +- 110.65
multi client tasks async per second 38944.39 +- 540.41
1:1 actor calls sync per second 3417.26 +- 31.38
1:1 actor calls async per second 7113.32 +- 94.16
1:1 actor calls concurrent per second 6380.13 +- 64.04
1:n actor calls async per second 20064.44 +- 117.48
n:n actor calls async per second 46639.69 +- 6856.08
n:n actor calls with arg async per second 6974.33 +- 25.07
1:1 async-actor calls sync per second 2283.84 +- 29.82
1:1 async-actor calls async per second 4619.38 +- 107.56
1:1 async-actor calls with args async per second 3100.14 +- 44.4
1:n async-actor calls async per second 17291.48 +- 175.33
n:n async-actor calls async per second 29663.23 +- 1660.87

single client put calls and single client get object containing 10k refs show 80~150% increase in ops/sec.

Related issue number

#17803

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mwtian mwtian marked this pull request as ready for review August 25, 2021 16:03
@ericl
Copy link
Contributor

ericl commented Aug 25, 2021

Can we add a message in ray memory telling users to set RAY_record_ref_creation_sites=1 for additional debugging information? Maybe we should group these under a RAY_DEBUG flag in general once @stephanie-wang adds her metadata extras too.

In general, we need to make sure user instructions are exposed in the right messages.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 25, 2021
@mwtian
Copy link
Member Author

mwtian commented Aug 25, 2021

Can we add a message in ray memory telling users to set RAY_record_ref_creation_sites=1 for additional debugging information? Maybe we should group these under a RAY_DEBUG flag in general once @stephanie-wang adds her metadata extras too.

In general, we need to make sure user instructions are exposed in the right messages.

Makes sense. Will add the necessary info.

…us: 2021-08-25 17:10:41.494492 ========

Grouping by node address...        Sorting by object size...        Display allentries per group...

--- Summary for node address: 192.168.0.234 ---
Mem Used by Objects  Local References  Pinned        Pending Tasks  Captured in Objects  Actor Handles
0.0 B                1, (-1.0 B)       0, (0.0 B)    0, (0.0 B)     0, (0.0 B)           0, (0.0 B)

--- Object references for node address: 192.168.0.234 ---
IP Address | PID | Type | Call Site | Size | Reference Type | Object Ref

192.168.0.234 | 27804 | Driver |           | ?    | LOCAL_REFERENCE | a67dc375e60ddd1affffffffffffffffffffffff0100000001000000

To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1

--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed message
@mwtian
Copy link
Member Author

mwtian commented Aug 26, 2021

Added a message to ray memory along with other fixes.
Before:

======== Object references status: 2021-08-25 16:39:23.421086 ========
Grouping by node address...        Sorting by object size...        Display allentries per group...


--- Summary for node address: total_actor_handles ---
Mem Used by Objects  Local References  Pinned        Pending Tasks  Captured in Objects  Actor Handles
0.0 B                1, (-1.0 B)       0, (0.0 B)    0, (0.0 B)     0, (0.0 B)           0, (0.0 B)

--- Object references for node address: total_actor_handles ---
IP Address    PID    Type    Call Site               Size    Reference Type      Object Ref
192.168.0.234  20905  Driver  (task call)  | /Users/  ?       LOCAL_REFERENCE     a67dc375e60ddd1affffffffffffffffffffffff0100000001000000
                             mwtian/work/hello/hell
                             o-cluster.py:<module>:
                             11



--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed

After:

======== Object references status: 2021-08-25 17:08:28.892803 ========
Grouping by node address...        Sorting by object size...        Display allentries per group...


--- Summary for node address: 192.168.0.234 ---
Mem Used by Objects  Local References  Pinned        Pending Tasks  Captured in Objects  Actor Handles
0.0 B                1, (-1.0 B)       0, (0.0 B)    0, (0.0 B)     0, (0.0 B)           0, (0.0 B)

--- Object references for node address: 192.168.0.234 ---
IP Address       PID    Type    Call Site               Size    Reference Type      Object Ref
192.168.0.234    27804  Driver  ?                       ?       LOCAL_REFERENCE     a67dc375e60ddd1affffffffffffffffffffffff0100000001000000

To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1

--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed

@mwtian mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 26, 2021
@ericl
Copy link
Contributor

ericl commented Aug 26, 2021

Looks good. Could we change the "?" to "disabled" for a bit of increased clarity?

ericl
ericl previously requested changes Aug 26, 2021
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also likely need to fix the test suite that checks the "ray memory" output.

dashboard/memory_utils.py Outdated Show resolved Hide resolved
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 26, 2021
Co-authored-by: Eric Liang <[email protected]>
@mwtian mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 27, 2021
@mwtian mwtian requested a review from ericl August 27, 2021 00:50
@mwtian
Copy link
Member Author

mwtian commented Aug 27, 2021

You will also likely need to fix the test suite that checks the "ray memory" output.

test_memstat is fixed now.

@ericl
Copy link
Contributor

ericl commented Aug 27, 2021

test_memstat seems to be failing in CI ^^

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 27, 2021
@mwtian
Copy link
Member Author

mwtian commented Aug 27, 2021

Looks like test_memstat is fixed this time (at least on Linux).

@mwtian mwtian removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 27, 2021
@ericl ericl merged commit 26679d6 into ray-project:master Aug 27, 2021
@ericl
Copy link
Contributor

ericl commented Aug 27, 2021

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants