doc update

Signed-off-by: paul-gibbons <[email protected]>
paul-gibbons · May 2, 2024 · 80af9a4 · 80af9a4
1 parent 14837e6
commit 80af9a4
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/docs/source/multimodal/mllm/video_neva.rst b/docs/source/multimodal/mllm/video_neva.rst
@@ -20,11 +20,15 @@ Video Neva Configuration
     media_type: video
     splice_single_frame: null
     num_frames: 8
+    image_token_len: 256
     image_folder: null
     video_folder: null
 
 - ``media_type``: If set to `video`, NeVa's dataloader goes through the additional preprocessing steps to represent the input video data as a series of image frames.
 - ``splice_single_frame``: Can either be set as `first`, `middle` or `last`. This will result in only a single frame in that specific location of the video being selected.
+- ``image_token_len``: The NeVa dataloader calculates `image_token_len` based on the height and width of the preprocessed image frame and the patch size of the CLIP model being used. 
+.. code-block:: python
+image_token_len = (224 // 14) * (224 // 14) = 16 * 16 = 256
 - ``num_frames``: This is used to select the number of image frames that will be used to represent the video.
 - ``video_folder``: This specifies the directory where the video files are located. This follows the same format as NeVa's `image_folder`.