-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce outlines.models.transformers_vision
#1052
Conversation
e033200
to
6adb73b
Compare
assert re.fullmatch(pattern, res) is not None, res | ||
|
||
|
||
@pytest.mark.parametrize("pattern", REGEX_PATTERNS) | ||
@pytest.mark.skip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a handful of open json validation issues. This is a good integration test case generally to address json generation failures because it applies random models to structured json generation.
``` | ||
|
||
Create convenience function to load a `PIL.Image` from URL | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | |
```python |
from pydantic import BaseModel | ||
from typing import List, Optional | ||
|
||
def img_from_url(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function missing
Very excited about this update! After pip installing Since I'm running on my GPU-less laptop, I'm using the tiny llava-hf/llava-interleave-qwen-0.5b-hf model, which is also a LlavaNextForConditionalGeneration model. I was able to recreate the below error with bczhou/tiny-llava-v1-hf as well. Here's the full code snippet: import outlines
from outlines.models.transformers_vision import transformers_vision
model = transformers_vision(
'llava-hf/llava-interleave-qwen-0.5b-hf'
)
from PIL import Image
from io import BytesIO
from urllib.request import urlopen
def img_from_url(url):
img_byte_stream = BytesIO(urlopen(url).read())
return Image.open(img_byte_stream).convert("RGB")
description_generator = outlines.generate.text(model)
description_generator(
"<image> detailed description:",
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")]
) And the full error I get, with transformers==4.43.3 and torch==2.2.2. TypeError Traceback (most recent call last)
Cell In[10], line 16
13 return Image.open(img_byte_stream).convert("RGB")
15 description_generator = outlines.generate.text(model)
---> 16 description_generator(
17 "<image> detailed description:",
18 [img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")]
19 )
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/outlines/generate/api.py:555, in VisionSequenceGeneratorAdapter.__call__(self, prompts, media, max_tokens, stop_at, seed, **model_specific_params)
549 prompts, media = self._validate_prompt_media_types(prompts, media)
551 generation_params = self.prepare_generation_parameters(
552 max_tokens, stop_at, seed
553 )
--> 555 completions = self.model.generate(
556 prompts,
557 media,
558 generation_params,
559 self.logits_processor,
560 self.sampling_params,
561 **model_specific_params,
562 )
564 return self._format(completions)
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/outlines/models/transformers_vision.py:56, in TransformersVision.generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters)
46 inputs = self.processor(prompts, media, padding=True, return_tensors="pt").to(
47 self.model.device
48 )
50 generation_kwargs = self._get_generation_kwargs(
51 prompts,
52 generation_parameters,
53 logits_processor,
54 sampling_parameters,
55 )
---> 56 generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)
58 # if single str input and single sample per input, convert to a 1D output
59 if isinstance(prompts, str):
60 # Should always be true until NotImplementedError above is fixed
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/outlines/models/transformers.py:350, in Transformers._generate_output_seq(self, prompts, inputs, generation_config, **generation_kwargs)
346 def _generate_output_seq(
347 self, prompts, inputs, generation_config, **generation_kwargs
348 ):
349 input_ids = inputs["input_ids"]
--> 350 output_ids = self.model.generate(
351 **inputs, generation_config=generation_config, **generation_kwargs
352 )
354 # encoder-decoder returns output_ids only, decoder-only returns full seq ids
355 if self.model.config.is_encoder_decoder:
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/transformers/generation/utils.py:1989, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1981 input_ids, model_kwargs = self._expand_inputs_for_generation(
1982 input_ids=input_ids,
1983 expand_size=generation_config.num_return_sequences,
1984 is_encoder_decoder=self.config.is_encoder_decoder,
1985 **model_kwargs,
1986 )
1988 # 13. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 1989 result = self._sample(
1990 input_ids,
1991 logits_processor=prepared_logits_processor,
1992 logits_warper=prepared_logits_warper,
1993 stopping_criteria=prepared_stopping_criteria,
1994 generation_config=generation_config,
1995 synced_gpus=synced_gpus,
1996 streamer=streamer,
1997 **model_kwargs,
1998 )
2000 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
2001 # 11. prepare logits warper
2002 prepared_logits_warper = (
2003 self._get_logits_warper(generation_config, device=input_ids.device)
2004 if generation_config.do_sample
2005 else None
2006 )
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/transformers/generation/utils.py:2932, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, logits_warper, **model_kwargs)
2929 model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
2931 # forward pass to get next token
-> 2932 outputs = self(**model_inputs, return_dict=True)
2934 if synced_gpus and this_peer_finished:
2935 continue # don't waste resources running the code we don't need
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1510 else:
-> 1511 return self._call_impl(*args, **kwargs)
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
1515 # If we don't have any hooks, we want to skip the rest of the logic in
1516 # this function, and just call forward.
1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1518 or _global_backward_pre_hooks or _global_backward_hooks
1519 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520 return forward_call(*args, **kwargs)
1522 try:
1523 result = None
File ~/miniconda3/envs/blendsql/lib/python3.9/site-packages/transformers/models/llava_next/modeling_llava_next.py:766, in LlavaNextForConditionalGeneration.forward(self, input_ids, pixel_values, image_sizes, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, vision_feature_select_strategy, labels, use_cache, output_attentions, output_hidden_states, return_dict)
763 # 2. Merge text and images
764 if pixel_values is not None and input_ids.shape[1] != 1 and pixel_values.size(0) > 0:
765 # ! infer image_num_patches from image_sizes
--> 766 image_num_patches = [
767 image_size_to_num_patches(
768 image_size=imsize,
769 grid_pinpoints=self.config.image_grid_pinpoints,
770 patch_size=self.config.vision_config.image_size,
771 )
772 for imsize in image_sizes
773 ]
774 # figure out if pixel_values is concatenated or stacked
775 if pixel_values.dim() == 5:
776 # stacking when input is (batch_size, num_patches, num_channels, height, width)
TypeError: 'NoneType' object is not iterable Good (?) news is that it's at the transformers level, haven't had time to debug in much detail though. As a sanity check, I verified that the Not opening a stand-alone issue since this feature isn't apart of an official outlines release yet, but happy to create one if you'd prefer! |
@parkervg I was able to reproduce your error. Issue was that it's trying to use Could you please try setting the model and processor classes?
We probably want to default to |
Thanks for the guidance, opened an issue and a corresponding PR here: #1077 |
Rendered Docs: https://github.com/lapp0/outlines/blob/multimodal-models/docs/reference/models/transformers_vision.md
Changes
models.transformers_vision
which subclassesmodels.transformers
and overrides its behavior so it applies, instead ofAutoTokenizer
,AutoProcessor
to handle the text ANDPIL.Images
mediaVisionSequenceGeneratorAdapter
, handling and validating themedia
argument.outlines.generate
to dispatchTransformersVision
models toVisionSequenceGeneratorAdapter
Tests
tests/generate/test_api.py
: Testprompt
/media
validationtests/generate/test_generate.py
:model_transformers_vision
fixture. tests pass locally, but disabled because a model small enough for CI isn't availableoutlines.generate
generators to ensure dispatchers for this new sequence generator is handled correctly.