ray-project · jjyao · Oct 25, 2022 · Oct 18, 2022 · Oct 24, 2022
@@ -8,7 +8,7 @@ This section is a collection of common design patterns (and anti-patterns) for R
 - New users trying to understand how to get started with Ray, and
 - Advanced users trying to optimize their use of Ray actors
 
-You may also be interested in visiting the design patterns section for :ref:`tasks <task-patterns>`.
+You may also be interested in visiting the design patterns section for :ref:`tasks <core-patterns>`.
 
 .. toctree::
     :maxdepth: -1

@@ -0,0 +1,72 @@
+# __pattern_start__
+import ray
+import time
+from numpy import random
+
+
+def partition(collection):
+    # Use the last element as the pivot
+    pivot = collection.pop()
+    greater, lesser = [], []
+    for element in collection:
+        if element > pivot:
+            greater.append(element)
+        else:
+            lesser.append(element)
+    return lesser, pivot, greater
+
+
+def quick_sort(collection):
+    if len(collection) <= 200000:  # magic number
+        return sorted(collection)
+    else:
+        lesser, pivot, greater = partition(collection)
+        lesser = quick_sort(lesser)
+        greater = quick_sort(greater)
+    return lesser + [pivot] + greater
+
+
+@ray.remote
+def quick_sort_distributed(collection):
+    # Tiny tasks are an antipattern.
+    # Thus, in our example we have a "magic number" to
+    # toggle when distributed recursion should be used vs
+    # when the sorting should be done in place. The rule
+    # of thumb is that the duration of an individual task
+    # should be at least 1 second.
+    if len(collection) <= 200000:  # magic number
+        return sorted(collection)
+    else:
+        lesser, pivot, greater = partition(collection)
+        lesser = quick_sort_distributed.remote(lesser)
+        greater = quick_sort_distributed.remote(greater)
+        return ray.get(lesser) + [pivot] + ray.get(greater)
+
+
+for size in [200000, 4000000, 8000000]:
+    print(f"Array size: {size}")
+    unsorted = random.randint(1000000, size=(size)).tolist()
+    s = time.time()
+    quick_sort(unsorted)
+    print(f"Sequential execution: {(time.time() - s):.3f}")
+    s = time.time()
+    ray.get(quick_sort_distributed.remote(unsorted))
+    print(f"Distributed execution: {(time.time() - s):.3f}")
+    print("--" * 10)
+
+# Outputs:
+
+# Array size: 200000
+# Sequential execution: 0.040
+# Distributed execution: 0.152
+# --------------------
+# Array size: 4000000
+# Sequential execution: 6.161
+# Distributed execution: 5.779
+# --------------------
+# Array size: 8000000
+# Sequential execution: 15.459
+# Distributed execution: 11.282
+# --------------------
+
+# __pattern_end__
@@ -292,7 +292,7 @@
    "source": [
     "The `task` Ray task contains all logic necessary to load a data batch, transform it and fit and evaluate models on it.\n",
     "\n",
-    "You may notice that we have previously defined `fit_and_score_sklearn` as a Ray task as well and set it to be executed from inside `task`. This allows us to dynamically create a [tree of tasks](task-pattern-tree-of-tasks), ensuring that the cluster resources are fully utillized. Without this pattern, each `task` would need to be assigned several CPU cores for the model fitting, meaning that if certain models finish faster, then those CPU cores would stil stay occupied. Thankfully, Ray is able to deal with nested parallelism in tasks without the need for any extra logic, allowing us to simplify the code."
+    "You may notice that we have previously defined `fit_and_score_sklearn` as a Ray task as well and set it to be executed from inside `task`. This allows us to dynamically create a [tree of tasks](task-pattern-nested-tasks), ensuring that the cluster resources are fully utillized. Without this pattern, each `task` would need to be assigned several CPU cores for the model fitting, meaning that if certain models finish faster, then those CPU cores would stil stay occupied. Thankfully, Ray is able to deal with nested parallelism in tasks without the need for any extra logic, allowing us to simplify the code."
    ]
   },
   {

@@ -8,6 +8,7 @@ This section is a collection of common design patterns and anti-patterns for wri
 .. toctree::
     :maxdepth: 1
 
+    nested-tasks
     generators
     limit-pending-tasks
     limit-running-tasks

@@ -0,0 +1,35 @@
+.. _task-pattern-nested-tasks:
+
+Pattern: Using nested tasks to achieve nested parallelism
+=========================================================
+
+In this pattern, a remote task can dynamically call other remote tasks (including itself) for nested parallelism.
+This is useful when sub-tasks can be parallelized.
+
+Keep in mind, though, that nested tasks come with their own cost: extra worker processes, scheduling overhead, bookkeeping overhead, etc.
+To achieve speedup with nested parallelism, make sure each of your nested tasks does significant work. See :doc:`too-fine-grained-tasks` for more details.
+
+Example use case
+----------------
+
+You want to quick-sort a large list of numbers.
+By using nested tasks, we can sort the list in a distributed and parallel fashion.
+
+.. figure:: ../images/tree-of-tasks.svg
+
+    Tree of tasks
+
+
+Code example
+------------
+
+.. literalinclude:: ../doc_code/pattern_nested_tasks.py
+    :language: python
+    :start-after: __pattern_start__
+    :end-before: __pattern_end__
+
+We call :ref:`ray.get() <ray-get-ref>` after both ``quick_sort_distributed`` function invocations take place.
+This allows you to maximize parallelism in the workload. See :doc:`ray-get-loop` for more details.
+
+Notice in the execution times above that with smaller tasks, the non-distributed version is faster. However, as the task execution
+time increases, i.e. because the lists to sort are larger, the distributed version is faster.
@@ -78,7 +78,7 @@ Ray enables arbitrary functions to be executed asynchronously on separate Python
 
 .. _ray-object-refs:
 
-Passing object refs to Ray tasks 
+Passing object refs to Ray tasks
 ---------------------------------------
 
 In addition to values, `Object refs <objects.html>`__ can also be passed into remote functions. When the task gets executed, inside the function body **the argument will be the underlying value**. For example, take this function:
@@ -217,4 +217,3 @@ More about Ray Tasks
     tasks/generators.rst
     tasks/fault-tolerance.rst
     tasks/scheduling.rst
-    tasks/patterns/index.rst