-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tiny model creation script #27674
Conversation
if "vocab_size" in model_tester_kwargs: | ||
if "text_kwargs" in inspect.signature(model_tester_class.__init__).parameters.keys(): | ||
vocab_size = model_tester_kwargs.pop("vocab_size") | ||
model_tester_kwargs["text_kwargs"] = {"vocab_size": vocab_size} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was original in convert_tokenizer
with if config_class.__name__ in [...]
.
Move that logic to here and without using a list of specific config classes --> make it more generic, so we don't need to update this script frequently.
# make sure this is long enough (some model tester has `20` for this attr.) to pass `text-generation` | ||
# pipeline tests. | ||
max_positions = [] | ||
for key in ["max_position_embeddings", "max_source_positions", "max_target_positions"]: | ||
if getattr(config, key, 0) > 0: | ||
max_positions.append(getattr(config, key)) | ||
if getattr(config, "text_config", None) is not None: | ||
if getattr(config.text_config, key, None) is not None: | ||
max_positions.append(getattr(config.text_config, key)) | ||
if len(max_positions) > 0: | ||
max_position = max(200, min(max_positions)) | ||
for key in ["max_position_embeddings", "max_source_positions", "max_target_positions"]: | ||
if getattr(config, key, 0) > 0: | ||
setattr(config, key, max_position) | ||
if getattr(config, "text_config", None) is not None: | ||
if getattr(config.text_config, key, None) is not None: | ||
setattr(config.text_config, key, max_position) | ||
|
||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid the failing in TextGenerationPipelineTests
at
outputs = text_generator("This is a test" * 500, handle_long_generation="hole", max_new_tokens=20)
(failing happens if a model tester has max_position_embeddings=20
)
Also taking max_source_positions
and max_target_positions
into account (e.g. whisper)
elif ( | ||
# `FuyuConfig` saves data under both FuyuConfig and its `text_config`. This is not good, but let's just update | ||
# every involved fields to avoid potential failure. | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif --> if
just make sure config.text_config
is also updated (for fuyu)
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
What does this PR do?
A few fixes or improvements I found necessary while working on #27388.
See comments along the changes