Raspawar/add vlm support #16751

raspawar · 2024-10-30T11:59:10Z

Support for vision-language models, those that can accept images and text as input and produce text.

these are akin to https://platform.openai.com/docs/guides/vision with three notable differences -

images can be passed with img tags in the regular text content
images can be passed as NVCF asset ids
not all model endpoints support all features, e.g. server-side download of images not available with
adept/fuyu-8b, google/deplot, microsoft/kosmos-2, google/paligemma; some models endpoints restrict image size; some models support one and only image; some models do not support gif or webp; kosmos-2 does not support streaming

cc: @sumitkbh @mattf @dglogo

review-notebook-app · 2024-10-30T11:59:15Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…r/llama_index into raspawar/add_vlm_support

mattf

please address comments.

failing tests -

FAILED tests/test_multi_modal_nvidia.py::test_vlm_asset_id[invoke-content0-microsoft/phi-3-vision-128k-instruct] - TypeError: sequence item 0: expected str instance, function found
FAILED tests/test_multi_modal_nvidia.py::test_vlm_asset_id[stream-content0-microsoft/phi-3-vision-128k-instruct] - TypeError: sequence item 0: expected str instance, function found

...i_modal_llms/llama-index-multi-modal-llms-nvidia/llama_index/multi_modal_llms/nvidia/base.py

logan-markewich · 2024-11-07T15:31:47Z

...ons/multi_modal_llms/llama-index-multi-modal-llms-nvidia/tests/data/nvidia-picasso-large.png

tbh the way cicd works, including extra files like this is a huge pain

I would just create an image in-memory -- like a black or white square. Or download the images at runtime

got it will do

raspawar · 2024-11-11T13:58:14Z

@logan-markewich can u ptal. I donno why the coverage is considering the test cases also.

logan-markewich · 2024-11-12T03:11:37Z

llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms-nvidia/README.md

Can we have an actual readme? Something with the install command and general usage (probably similar to the notebook)

logan-markewich

Just one small request

raspawar added 9 commits October 29, 2024 17:08

add code for image url

2ae6061

fix filetype bug

4ac9c77

add notebook

f650f0f

add latest code for vlm

15cefb7

remove poetry lock file

46f0f35

add not impl and fix url

a29171d

add stream complete code and utils

8fa3a30

add logic to read from local files

c8fb46e

add notebook for multi-modal examples

3960d74

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Oct 30, 2024

raspawar and others added 4 commits October 30, 2024 18:17

rename pkg name and add test cases

55a86f2

Merge branch 'main' into raspawar/add_vlm_support

f21979a

add test cases and images

70fd6b2

Merge branch 'raspawar/add_vlm_support' of https://github.com/raspawa…

5c5a3ba

…r/llama_index into raspawar/add_vlm_support

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Oct 30, 2024

mattf suggested changes Oct 30, 2024

View reviewed changes

raspawar added 6 commits November 1, 2024 03:07

fix test cases

87591e5

code cleanup

b452eca

add code for chat, stream_chat methods

3cefae3

add async methods for nvidia vlm cls

5b5c106

add async code snippets

7a506c6

add test cases for async methods

ffe8fc2

raspawar requested a review from mattf November 5, 2024 11:39

mattf suggested changes Nov 5, 2024

View reviewed changes

raspawar added 3 commits November 5, 2024 20:17

remove custom_headers

a1fc0f4

fix build file

2b54d95

add aiohttp as pyproject dependency

19a027b

logan-markewich reviewed Nov 7, 2024

View reviewed changes

raspawar and others added 4 commits November 11, 2024 12:34

skip local imgs and add utils test cases

65f18c3

add more test cases, fix local images

d30a4b8

add pil dependency

1363931

Merge branch 'main' into raspawar/add_vlm_support

d979be1

remove lock

bac2932

logan-markewich reviewed Nov 12, 2024

View reviewed changes

update readme file

f5bddfa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raspawar/add vlm support #16751

Raspawar/add vlm support #16751

raspawar commented Oct 30, 2024

review-notebook-app bot commented Oct 30, 2024

mattf left a comment

logan-markewich Nov 7, 2024

raspawar Nov 11, 2024

raspawar commented Nov 11, 2024

logan-markewich Nov 12, 2024

raspawar Nov 12, 2024

logan-markewich left a comment

Raspawar/add vlm support #16751

Are you sure you want to change the base?

Raspawar/add vlm support #16751

Conversation

raspawar commented Oct 30, 2024

Support for vision-language models, those that can accept images and text as input and produce text.

review-notebook-app bot commented Oct 30, 2024

mattf left a comment

Choose a reason for hiding this comment

logan-markewich Nov 7, 2024

Choose a reason for hiding this comment

raspawar Nov 11, 2024

Choose a reason for hiding this comment

raspawar commented Nov 11, 2024

logan-markewich Nov 12, 2024

Choose a reason for hiding this comment

raspawar Nov 12, 2024

Choose a reason for hiding this comment

logan-markewich left a comment

Choose a reason for hiding this comment