style: `max_new_tokens` are set twice with the same value #5368

faaany · 2023-07-14T15:14:14Z

Related Issues

In the following code snippet, we can see that for text-generation task, max_new_tokens are set twice: first set to self.max_length and then overwritten by max_length or self.max_length. But the first definition actually doesn't take any effect, because it will be overwritten by the second definition anyway.

            if is_text_generation and "return_full_text" not in model_input_kwargs:
                model_input_kwargs["return_full_text"] = False
                model_input_kwargs["max_new_tokens"] = self.max_length
            if stop_words:
                sw = StopWordsCriteria(tokenizer=self.pipe.tokenizer, stop_words=stop_words, device=self.pipe.device)
                model_input_kwargs["stopping_criteria"] = StoppingCriteriaList([sw])

            if "num_beams" in model_input_kwargs:
                num_beams = model_input_kwargs["num_beams"]
                if (
                    "num_return_sequences" in model_input_kwargs
                    and model_input_kwargs["num_return_sequences"] > num_beams
                ):
                    num_return_sequences = model_input_kwargs["num_return_sequences"]
                    logger.warning(
                        "num_return_sequences %s should not be larger than num_beams %s, hence setting it equal to num_beams",
                        num_return_sequences,
                        num_beams,
                    )
                    model_input_kwargs["num_return_sequences"] = num_beams

            # max_new_tokens is used for text-generation and max_length for text2text-generation
            if is_text_generation:
                model_input_kwargs["max_new_tokens"] = model_input_kwargs.pop("max_length", self.max_length)
            else:
                model_input_kwargs["max_length"] = self.max_length

Proposed Changes:

Remove the first one

How did you test it?

N/A

Notes for the reviewer

N/A

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2023-07-14T15:29:41Z

Pull Request Test Coverage Report for Build 5555588245

0 of 0 changed or added relevant lines in 0 files are covered.
3 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.007%) to 45.481%

Files with Coverage Reduction	New Missed Lines	%
nodes/prompt/invocation_layer/hugging_face.py	1	92.19%
utils/context_matching.py	2	89.58%

Totals
Change from base Build 5545669749:	-0.007%
Covered Lines:	10538
Relevant Lines:	23170

💛 - Coveralls

anakin87

Nice catch @faaany...

Thank you for contributing once again!

remove duplicate

63d1332

faaany requested a review from a team as a code owner July 14, 2023 15:14

faaany requested review from anakin87 and removed request for a team July 14, 2023 15:14

anakin87 approved these changes Jul 17, 2023

View reviewed changes

anakin87 merged commit 09a1d3c into deepset-ai:main Jul 17, 2023
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

style: `max_new_tokens` are set twice with the same value #5368

style: `max_new_tokens` are set twice with the same value #5368

faaany commented Jul 14, 2023

coveralls commented Jul 14, 2023

anakin87 left a comment

style: max_new_tokens are set twice with the same value #5368

style: max_new_tokens are set twice with the same value #5368

Conversation

faaany commented Jul 14, 2023

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented Jul 14, 2023

Pull Request Test Coverage Report for Build 5555588245

💛 - Coveralls

anakin87 left a comment

Choose a reason for hiding this comment

style: `max_new_tokens` are set twice with the same value #5368

style: `max_new_tokens` are set twice with the same value #5368