diff --git a/docs/source/models/vlm.rst b/docs/source/models/vlm.rst index 70ac82e2005b9..de55a1a099192 100644 --- a/docs/source/models/vlm.rst +++ b/docs/source/models/vlm.rst @@ -5,6 +5,9 @@ Using VLMs vLLM provides experimental support for Vision Language Models (VLMs). This document shows you how to run and serve these models using vLLM. +.. important:: + We are actively iterating on VLM support. Expect breaking changes to VLM usage and development in upcoming releases without prior deprecation. + Engine Arguments ---------------- @@ -39,6 +42,10 @@ To initialize a VLM, the aforementioned arguments must be passed to the ``LLM`` image_feature_size=576, ) +.. important:: + We will remove most of the vision-specific arguments in a future release as they can be inferred from the HuggingFace configuration. + + To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`: * ``prompt``: The prompt should have a number of ```` tokens equal to ``image_feature_size``. @@ -63,6 +70,9 @@ To pass an image to the model, note the following in :class:`vllm.inputs.PromptS A code example can be found in `examples/llava_example.py `_. +.. important:: + We will remove the need to format image tokens in a future release. Afterwards, the input text will follow the same format as that for the original HuggingFace model. + Online OpenAI Vision API Compatible Inference ---------------------------------------------- @@ -89,6 +99,9 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with --image-feature-size 576 \ --chat-template template_llava.jinja +.. important:: + We will remove most of the vision-specific arguments in a future release as they can be inferred from the HuggingFace configuration. + To consume the server, you can use the OpenAI client like in the example below: .. code-block:: python