-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: OpenCLIP EF device
param
#1806
[ENH]: OpenCLIP EF device
param
#1806
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Ref: https://discord.com/channels/1073293645303795742/1214028592372252682 ## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Added `device` optional param to OpenCLIP EF ## Test plan *How are these changes tested?* - [ ] Tests pass locally with `pytest` for python, `yarn test` for js ## Documentation Changes > **NOTE:** It doesn't see we have OpenCLIP EF docs. Will have to add.
Is this already working? It seems that when computing embeddings, the model is on GPU but the data is on CPU (code below). from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
from chromadb.utils.data_loaders import ImageLoader
import os
import chromadb
from tqdm import tqdm
import numpy as np
from PIL import Image
embedding_function = OpenCLIPEmbeddingFunction(device='cuda')
data_loader = ImageLoader()
client = chromadb.PersistentClient(path="./.chroma")
collection = client.create_collection(
name='multimodal_collection',
embedding_function=embedding_function)
CHUNK_SIZE = 1
for idx in tqdm(range(0, len(os.listdir("wikiart")[:2]), CHUNK_SIZE)):
ids = [str(i) for i in range(len(os.listdir("wikiart")))[idx:idx+CHUNK_SIZE]]
imgs = [np.array(Image.open(f"wikiart/{img_name}")) for img_name in os.listdir("wikiart")[idx:idx+CHUNK_SIZE]]
collection.add(ids=ids, images=imgs) Error:
This should be fixed by loading the batch to GPU when creating the embedding. I can work on this, but wanted to make sure I'm not doing anything silly beforehand. |
Same issue here, specifically for the OpenClipEmbeddingModel. I ended up by "patching" the file open_clip\tokenizer.py for the class SimpleTokenizer in function __call__ like this and it worked: This is surely only a brute-force hack for one particular case and I'm pretty sure there would be a more elegant and generally applicable solution ;-) |
Ref: https://discord.com/channels/1073293645303795742/1214028592372252682
Description of changes
Summarize the changes made by this PR.
device
optional param to OpenCLIP EFTest plan
How are these changes tested?
pytest
for python,yarn test
for jsDocumentation Changes