Skip to content

Commit

Permalink
format
Browse files Browse the repository at this point in the history
  • Loading branch information
rxsalad committed Nov 6, 2024
1 parent d4be44d commit 15d473b
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 9 deletions.
4 changes: 4 additions & 0 deletions dictionaries/salad-cloud.txt
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,7 @@ GGUF
mlabonne
Nemotron
ollama
nvcc
NVCC
rxjupyterlab
rxjupyterdata
12 changes: 6 additions & 6 deletions guides/llm/llm-general.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ Windows WSL:
Longer context lengths not only increase waiting time and negatively impact user experience, but may also lead to the
server response timeout errors at the load balancer in front of the inference servers, which has a maximum timeout limit
of 100 seconds. Enabling token streaming on the servers allows tokens to be returned one by one, rather than waiting for
the entire response to be generated. This feature shows the generation progress in real-time, significantly enhancing
the user experience, and helping to avoid the timeout errors.
the entire response to be generated. This feature shows the generation progress in real-time, significantly enhancing
the user experience, and helping to avoid the timeout errors.

When more VRAM is available, batched inference can significantly increase throughput by effectively leveraging GPU cache
and parallel processing, while only slightly increasing the processing time. Here is the test data from the same PC:
Expand Down Expand Up @@ -129,11 +129,11 @@ container gateway can map a public URL to this IPv6 port. Optionally, you can en
using an API token.

To support LLM inference efficiently, the container gateway can be configured to use the Least Connections algorithm and
forward concurrent requests to the inference servers in a container group. The server response timeout setting controls
how long the container gateway will wait for a response from an instance after sending a request, with a maximum limit
forward concurrent requests to the inference servers in a container group. The server response timeout setting controls
how long the container gateway will wait for a response from an instance after sending a request, with a maximum limit
of 100 seconds. This timeout affects the maximum length of generated text (for non-streaming) and the number of requests
that can queue locally on inference servers. For more information on load balancing options and how to adjust these
settings to fit your needs, please refer to [this guide](/products/sce/gateway/load-balancer-options).
that can queue locally on inference servers. For more information on load balancing options and how to adjust these
settings to fit your needs, please refer to [this guide](/products/sce/gateway/load-balancer-options).

**To use this solution effectively, system requirements and capabilities should be clearly defined and planned to
properly configure the inference servers.** Deploying the Readiness Probes is also essential to ensure that requests are
Expand Down
2 changes: 1 addition & 1 deletion tutorials/docker-run.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ preferred method in most cases, as it directs requests to the least busy instanc

By default, the container gateway sends multiple requests to an instance simultaneously. However, it can be configured
to send only one request to an instance at a time, with subsequent requests waiting in the gateway until the current one
is completed. For more information on load balancing options and how to adjust these settings to fit your needs, please
is completed. For more information on load balancing options and how to adjust these settings to fit your needs, please
refer to [this guide](/products/sce/gateway/load-balancer-options).

Avoid sending very large requests (hundreds of MB or more) to instances through the container gateway, as this can
Expand Down
4 changes: 2 additions & 2 deletions tutorials/jupyterlab.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ back to the cloud.
<img src="/products/sce/images/jupyterlab/bfebac8-tech_doc_1.jpg" />

Under the hood, we employ the inotifywait command-line tool that uses the inotify Linux kernel subsystem to watch for
changes in the /root/data directory. Every time files are manually saved through the JupyerLab menu, or automatically
changes in the /root/data directory. Every time files are manually saved through the JupyterLab menu, or automatically
saved by the JupyterLab’s autosave feature, the inotifywait command captures events such as create, delete or modify.
Subsequently, the script triggers synchronization. All three public cloud platforms offer sync commands that can make
the contents under the source the same as the content under the destination by calculating and copying only the
Expand Down Expand Up @@ -215,7 +215,7 @@ name, and project ID to the container.

<img src="/products/sce/images/jupyterlab/7cdc53a-tech_doc_16.jpg" />

# Run JupyerLab over SaladCloud
# Run JupyterLab over SaladCloud

To run a JupyterLab instance on SaladCloud, you can log in the SaladCloud Console and deploy the JupyterLab instance by
selecting 'Deploy a Container Group' with the following parameters:
Expand Down

0 comments on commit 15d473b

Please sign in to comment.