Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing documentation of input_ids and past_key_values in model.forward #31795

Open
4 tasks
alex-hh opened this issue Jul 4, 2024 · 3 comments
Open
4 tasks
Assignees
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@alex-hh
Copy link

alex-hh commented Jul 4, 2024

System Info

Current documentation

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The docs (e.g. for mistral forward method) state that :

If past_key_values is used, optionally only the last decoder_input_ids have to be input (see past_key_values).

If past_key_values are used, the user can optionally input only the last input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all input_ids of shape (batch_size, sequence_length).

https://huggingface.co/docs/transformers/main/model_doc/mistral#transformers.MistralModel.forward

Expected behavior

It is my understanding that it is in fact not optional but obligatory to pass only the last input ids (those that don’t have their past key value states given to this model), as there is no handling of the case where full input ids are passed. C.f. https://discuss.huggingface.co/t/correct-input-ids-when-passing-past-key-values/92044

@zucchini-nlp
Copy link
Member

@alex-hh hey!

Yes, you're right! In case if we have a past_key_values, it's required to crop the inputs to the unprocessed tokens only. There are some differences in how the inputs are cropped if we're using SinkCache object, but the general rule is all new tokens that are not in the kv-cache yet.

Thanks for noting the discrepancy in docs, I will update docs regarding past_kv this week!

Copy link

github-actions bot commented Aug 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@zucchini-nlp
Copy link
Member

not stale

@zucchini-nlp zucchini-nlp added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

2 participants