[Data] Refine estimate of object store memory usage from pending task outputs #43298

bveeramani · 2024-02-20T23:23:10Z

Why are these changes needed?

Ray Core buffers blocks produced by running tasks. To estimate the amount of object store usage from buffered blocks, we use the following formula:

ray/python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py

Lines 189 to 191 in de439b4

    
           self.num_tasks_running 
        
           * estimated_bytes_per_output 
        
           * context._max_num_blocks_in_streaming_gen_buffer

This expression is an overestimation because of the following reasons:

num_tasks_running represents the number of launched tasks, not the number of actually running tasks. This is especially an issue with ActorPoolMapOperator, because we launch multiple tasks per actor, but only one task runs at at time per actor.
context._max_num_blocks_in_streaming_gen_buffer the maximum number of buffered blocks, not the typical number of buffered blocks. Often, tasks only produce one output, so the number of buffered outputs is at most one.

This PR addresses the two issues described above.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani · 2024-02-21T00:15:17Z

python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py

+        num_tasks_running = self.num_tasks_running
+        if isinstance(self._op, ActorPoolMapOperator):
+            num_tasks_running = min(
+                num_tasks_running, self._op._actor_pool.num_active_actors()
+            )


I don't like how this couples OpRuntimeMetrics with ActorPoolMapOperator, but not sure how else to implement it. Open to alternative implementations.

we can also add a method on the operator side to let it report the actual number. but that sounds a bit too over design. current way is fine as well.

Signed-off-by: Balaji Veeramani <[email protected]>

Initial commit

7116d37

Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani requested review from ericl, scv119, c21, amogkam, scottjlee, raulchen, stephanie-wang and omatthew98 as code owners February 20, 2024 23:23

bveeramani added 2 commits February 20, 2024 15:47

Update implementation

c75bc52

Signed-off-by: Balaji Veeramani <[email protected]>

Fix bug

7b91310

Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani commented Feb 21, 2024

View reviewed changes

raulchen approved these changes Feb 21, 2024

View reviewed changes

bveeramani added 2 commits February 20, 2024 16:48

Add decorator

eb71140

Signed-off-by: Balaji Veeramani <[email protected]>

Fix test

948cb24

Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani merged commit 38106e3 into ray-project:master Feb 21, 2024
9 checks passed

bveeramani deleted the improve-estimate branch February 21, 2024 22:28

bveeramani assigned raulchen Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Refine estimate of object store memory usage from pending task outputs #43298

[Data] Refine estimate of object store memory usage from pending task outputs #43298

bveeramani commented Feb 20, 2024 •

edited

Loading

bveeramani Feb 21, 2024

raulchen Feb 21, 2024

	self.num_tasks_running
	* estimated_bytes_per_output
	* context._max_num_blocks_in_streaming_gen_buffer

[Data] Refine estimate of object store memory usage from pending task outputs #43298

[Data] Refine estimate of object store memory usage from pending task outputs #43298

Conversation

bveeramani commented Feb 20, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

bveeramani Feb 21, 2024

Choose a reason for hiding this comment

raulchen Feb 21, 2024

Choose a reason for hiding this comment

bveeramani commented Feb 20, 2024 •

edited

Loading