[OA] Fixes for Batch Inference Basics template #156

scottjlee · 2024-03-27T00:21:41Z

Address feedback / fixes from dogfooding batch LLM template:

Fix bug with text vs item column from from_items() call
Use Mistral model by default, so users are not required to supply HF token
Use single quotes instead of triple quotes for prompt data
~~Move scaling sections to after step 4~~ vLLM requires GPUs, so need to talk about GPUs in the toy setup as well.
Clean up title + headers
Add accelerator type (A10G or L4)

Alongside https://github.com/anyscale/product/pull/27262

Signed-off-by: Scott Lee <[email protected]>

scottjlee · 2024-03-27T23:12:02Z

templates/batch-llm/README.ipynb

+    "## Scaling with GPUs\n",
+    "\n",
+    "Apply batch inference for all input data with the Ray Data [`map_batches`](https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.map_batches.html) method. When using vLLM, LLM instances require GPUs; here, we will demonstrate how to configure Ray Data to scale the number of LLM instances and GPUs needed.\n",
+    "\n",
+    "To use GPUs for inference in the Workspace, we can specify `num_gpus` and `concurrency` in the `ds.map_batches()` call below to indicate the number of LLM instances and the number of GPUs per LLM instance, respectively. For example, with `concurrency=4` and `num_gpus=1`, we have 4 LLM instances, each using 1 GPU, so we need 4 GPUs total."


note, since vLLM requires GPUs, i had to put this section before the "scaling to larger dataset," since we will need GPUs for even the toy setup.

ericl · 2024-03-27T23:26:07Z

templates/batch-llm/README.ipynb

@@ -262,7 +212,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Apply batch inference for all input data with the Ray Data [`map_batches`](https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.map_batches.html) method. Here, you can easily configure Ray Data to scale the number of LLM instances and compute (number of GPUs to use)."
+    "## Scaling with GPUs\n",


Since this is under step 4, use "###".

ericl · 2024-03-27T23:27:01Z

templates/batch-llm/README.ipynb

@@ -371,7 +387,25 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Summary\n",
+    "## Submitting an Anyscale Job\n",


Remove this for now, the jobs tutorial isn't ready yet. When it is, we can link to that instead of repeating the same content in each template.

Also, this will be "ray job submit" within workspaces.

ericl · 2024-03-27T23:27:29Z

templates/batch-llm/README.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scaling to a larger dataset\n",


Similar comment below, if these are all under section 4 they need to be one level deeper as headings.

ericl · 2024-03-27T23:28:43Z

templates/batch-llm/README.ipynb

    "    # Specify the number of GPUs required per LLM instance.\n",
-    "    num_gpus=num_gpus_per_instance,\n",
+    "    num_gpus=1,\n",


When I ran this, I got

"""
raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.
"""

Similar to in #148 I think you need to set accelerator_type: A10G and/or make a function that returns A10G or L4 depending on AWS or GCP.

Or, set the dtype=half.

ah my bad, i was testing this on a custom workspace with A10s already configured, that makes sense. Adding a similar function as #148 which gets A10G/L4 depending on the cloud platform.

ericl · 2024-03-27T23:30:10Z

Please ping when it runs correctly in OA, still doesn't work for me

Signed-off-by: Scott Lee <[email protected]>

scottjlee · 2024-03-28T00:18:47Z

@ericl tested on OA workspace (link) with serverless, ready for another look. Thanks!

ericl

Nice, works e2e now, thanks!

Btw, I couldn't access the workspace you linked probably b/c it wasn't in the staging dogfood org, for sharing workspaces you probably want to use the "Try new UI" function in staging.

Signed-off-by: Scott Lee <[email protected]>

[OA] Fixes for Batch Inference Basics template

scottjlee added 4 commits March 26, 2024 17:18

address comments

023f23b

Signed-off-by: Scott Lee <[email protected]>

add anyscale jobs section

3e4a85e

Signed-off-by: Scott Lee <[email protected]>

update with workspace testing

1e0ee7d

Signed-off-by: Scott Lee <[email protected]>

formatting

8a58fa7

Signed-off-by: Scott Lee <[email protected]>

scottjlee commented Mar 27, 2024

View reviewed changes

scottjlee marked this pull request as ready for review March 27, 2024 23:13

scottjlee assigned ericl and c21 Mar 27, 2024

ericl reviewed Mar 27, 2024

View reviewed changes

scottjlee added 2 commits March 27, 2024 17:04

add accelerator type util

03a8b6e

Signed-off-by: Scott Lee <[email protected]>

clean up

3150215

Signed-off-by: Scott Lee <[email protected]>

scottjlee requested a review from ericl March 28, 2024 00:18

ericl approved these changes Mar 28, 2024

View reviewed changes

scottjlee and others added 2 commits March 28, 2024 11:28

fix accelerator util

4531563

Signed-off-by: Scott Lee <[email protected]>

Merge branch 'main' into sjl/fix-batch-inference

dee72a3

ericl merged commit 20613b8 into main Mar 28, 2024
1 check passed

anmscale pushed a commit that referenced this pull request Jun 22, 2024

Merge pull request #156 from anyscale/sjl/fix-batch-inference

1f8ce5d

[OA] Fixes for Batch Inference Basics template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OA] Fixes for Batch Inference Basics template #156

[OA] Fixes for Batch Inference Basics template #156

scottjlee commented Mar 27, 2024 •

edited

Loading

scottjlee Mar 27, 2024

ericl Mar 27, 2024

ericl Mar 27, 2024

ericl Mar 27, 2024

ericl Mar 27, 2024

ericl Mar 27, 2024 •

edited

Loading

scottjlee Mar 27, 2024

ericl commented Mar 27, 2024

scottjlee commented Mar 28, 2024

ericl left a comment

[OA] Fixes for Batch Inference Basics template #156

[OA] Fixes for Batch Inference Basics template #156

Conversation

scottjlee commented Mar 27, 2024 • edited Loading

scottjlee Mar 27, 2024

Choose a reason for hiding this comment

ericl Mar 27, 2024

Choose a reason for hiding this comment

ericl Mar 27, 2024

Choose a reason for hiding this comment

ericl Mar 27, 2024

Choose a reason for hiding this comment

ericl Mar 27, 2024

Choose a reason for hiding this comment

ericl Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

scottjlee Mar 27, 2024

Choose a reason for hiding this comment

ericl commented Mar 27, 2024

scottjlee commented Mar 28, 2024

ericl left a comment

Choose a reason for hiding this comment

scottjlee commented Mar 27, 2024 •

edited

Loading

ericl Mar 27, 2024 •

edited

Loading