Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspawar/add vlm support #16751

Merged
merged 28 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2ae6061
add code for image url
raspawar Oct 29, 2024
4ac9c77
fix filetype bug
raspawar Oct 29, 2024
f650f0f
add notebook
raspawar Oct 29, 2024
15cefb7
add latest code for vlm
raspawar Oct 29, 2024
46f0f35
remove poetry lock file
raspawar Oct 29, 2024
a29171d
add not impl and fix url
raspawar Oct 30, 2024
8fa3a30
add stream complete code and utils
raspawar Oct 30, 2024
c8fb46e
add logic to read from local files
raspawar Oct 30, 2024
3960d74
add notebook for multi-modal examples
raspawar Oct 30, 2024
55a86f2
rename pkg name and add test cases
raspawar Oct 30, 2024
f21979a
Merge branch 'main' into raspawar/add_vlm_support
raspawar Oct 30, 2024
70fd6b2
add test cases and images
raspawar Oct 30, 2024
5c5a3ba
Merge branch 'raspawar/add_vlm_support' of https://github.com/raspawa…
raspawar Oct 30, 2024
87591e5
fix test cases
raspawar Oct 31, 2024
b452eca
code cleanup
raspawar Oct 31, 2024
3cefae3
add code for chat, stream_chat methods
raspawar Nov 4, 2024
5b5c106
add async methods for nvidia vlm cls
raspawar Nov 5, 2024
7a506c6
add async code snippets
raspawar Nov 5, 2024
ffe8fc2
add test cases for async methods
raspawar Nov 5, 2024
a1fc0f4
remove custom_headers
raspawar Nov 5, 2024
2b54d95
fix build file
raspawar Nov 5, 2024
19a027b
add aiohttp as pyproject dependency
raspawar Nov 5, 2024
65f18c3
skip local imgs and add utils test cases
raspawar Nov 11, 2024
d30a4b8
add more test cases, fix local images
raspawar Nov 11, 2024
1363931
add pil dependency
raspawar Nov 11, 2024
d979be1
Merge branch 'main' into raspawar/add_vlm_support
raspawar Nov 11, 2024
bac2932
remove lock
logan-markewich Nov 12, 2024
f5bddfa
update readme file
raspawar Nov 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
302 changes: 302 additions & 0 deletions docs/docs/examples/multi_modal/nvidia_multi_modal.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/multi_modal/nvidia_multi_modal.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
"\n",
"# Multi-Modal LLM using NVIDIA endpoints for image reasoning\n",
"\n",
"In this notebook, we show how to use NVIDIA MultiModal LLM class/abstraction for image understanding/reasoning.\n",
"\n",
"We also show several functions we are now supporting for NVIDIA LLM:\n",
"* `complete` (both sync and async): for a single prompt and list of images\n",
"* `stream complete` (both sync and async): for steaming output of complete"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet llama-index-multi-modal-llms-nvidia llama-index-embeddings-nvidia llama-index-readers-file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"# del os.environ['NVIDIA_API_KEY'] ## delete key and reset\n",
"if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
" print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
"else:\n",
" nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
" assert nvapi_key.startswith(\n",
" \"nvapi-\"\n",
" ), f\"{nvapi_key[:5]}... is not a valid key\"\n",
" os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.multi_modal.nvidia import NVIDIAMultiModal\n",
"import base64\n",
"from llama_index.core.schema import ImageDocument\n",
"from PIL import Image\n",
"import requests\n",
"from io import BytesIO\n",
"import matplotlib.pyplot as plt\n",
"from llama_index.core.multi_modal_llms.generic_utils import load_image_urls\n",
"\n",
"llm = NVIDIAMultiModal()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize `OpenAIMultiModal` and Load Images from URLs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_urls = [\n",
" \"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg\",\n",
" \"https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg\",\n",
" \"https://www.sportsnet.ca/wp-content/uploads/2023/11/CP1688996471-1040x572.jpg\",\n",
" # Add yours here!\n",
"]\n",
"\n",
"img_response = requests.get(image_urls[0])\n",
"img = Image.open(BytesIO(img_response.content))\n",
"plt.imshow(img)\n",
"\n",
"image_url_documents = load_image_urls(image_urls)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Complete a prompt with a bunch of images"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"response = llm.complete(\n",
" prompt=f\"What is this image?\",\n",
" image_documents=image_url_documents,\n",
")\n",
"\n",
"print(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Steam Complete a prompt with a bunch of images"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stream_complete_response = llm.stream_complete(\n",
" prompt=f\"What is this image?\",\n",
" image_documents=image_url_documents,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for r in stream_complete_response:\n",
" print(r.text, end=\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"response_acomplete = await llm.acomplete(\n",
" prompt=\"Describe the images as an alternative text\",\n",
" image_documents=image_url_documents,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Passing an image as a base64 encoded string"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"imgr_content = base64.b64encode(\n",
" requests.get(\n",
" \"https://helloartsy.com/wp-content/uploads/kids/cats/how-to-draw-a-small-cat/how-to-draw-a-small-cat-step-6.jpg\"\n",
" ).content\n",
").decode(\"utf-8\")\n",
"\n",
"response = llm.complete(\n",
" prompt=\"List models in image\",\n",
" image_documents=[ImageDocument(image=imgr_content, mimetype=\"jpeg\")],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Passing an image as an NVCF asset\n",
"If your image is sufficiently large or you will pass it multiple times in a chat conversation, you may upload it once and reference it in your chat conversation\n",
"\n",
"See https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/assets.html for details about how upload the image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"content_type = \"image/jpg\"\n",
"description = \"example-image-from-lc-nv-ai-e-notebook\"\n",
"\n",
"create_response = requests.post(\n",
" \"https://api.nvcf.nvidia.com/v2/nvcf/assets\",\n",
" headers={\n",
" \"Authorization\": f\"Bearer {os.environ['NVIDIA_API_KEY']}\",\n",
" \"accept\": \"application/json\",\n",
" \"Content-Type\": \"application/json\",\n",
" },\n",
" json={\"contentType\": content_type, \"description\": description},\n",
")\n",
"create_response.raise_for_status()\n",
"\n",
"upload_response = requests.put(\n",
" create_response.json()[\"uploadUrl\"],\n",
" headers={\n",
" \"Content-Type\": content_type,\n",
" \"x-amz-meta-nvcf-asset-description\": description,\n",
" },\n",
" data=img_response.content,\n",
")\n",
"upload_response.raise_for_status()\n",
"\n",
"asset_id = create_response.json()[\"assetId\"]\n",
"asset_id"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"response = llm.stream_complete(\n",
" prompt=f\"Describe the image\",\n",
" image_documents=[\n",
" ImageDocument(metadata={\"asset_id\": asset_id}, mimetype=\"png\")\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Passing images from local files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import SimpleDirectoryReader\n",
"\n",
"# put your local directore here\n",
"image_documents = SimpleDirectoryReader(\"./images_wiki\").load_data()\n",
"\n",
"response = llm.complete(\n",
" prompt=\"Describe the images as an alternative text\",\n",
" image_documents=image_documents,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(response)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading
Loading