-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Paddle Inference]support miniGPT4's second part dy2st #6905
[Paddle Inference]support miniGPT4's second part dy2st #6905
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #6905 +/- ##
===========================================
- Coverage 59.87% 59.84% -0.04%
===========================================
Files 552 552
Lines 81452 81499 +47
===========================================
Hits 48772 48772
- Misses 32680 32727 +47
|
first_embeds = self.llama.embed_tokens(first_input_ids) | ||
second_embeds = self.llama.embed_tokens(second_input_ids) | ||
image_features = paddle.cast(image_features, dtype=first_embeds.dtype) | ||
inputs_embeds = paddle.concat([first_embeds, image_features, second_embeds], axis=1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generation_utils.py 理论上是和模型无关的,这里不应该融入过多的与模型逻辑有关的代码,所以这块不应该将 llama 相关的逻辑代码嵌入到这边来,建议将这些逻辑写到llama/modeling.py 文件里面去。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generation_utils.py 理论上是和模型无关的,这里不应该融入过多的与模型逻辑有关的代码,所以这块不应该将 llama 相关的逻辑代码嵌入到这边来,建议将这些逻辑写到llama/modeling.py 文件里面去。
done,感谢,已经在modeling.py中添加了一个新类LlamaForminiGPT4InferenceModel
@@ -159,12 +159,14 @@ def forward( | |||
cache_kvs=None, | |||
seq_len_encoder=None, | |||
seq_len_decoder=None, | |||
# past_key_values is useless,as it is replaced by kwargs["cache"], so confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这边是为了和 hf 对齐,所以优先建议使用 past_key_values 参数来,所以建议这里的 comment 删掉。
input_ids, | ||
eos_token_id, | ||
input_ids=None, | ||
inputs_embeds=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这个新加的 inputs_embeds 参数需要放在 temperature 后面,不然大部分调用 sample 函数的代码处都会报错,所以这个改动是很危险的。
你可以全局搜索一下:.sample(
就会发现调用这个函数的地方都是通过 args 的方式来传值
if input_ids is not None and inputs_embeds is not None: | ||
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") | ||
elif input_ids is None and inputs_embeds is None: | ||
raise ValueError("You have to specify either input_ids or inputs_embeds") | ||
|
||
# genereate a fake input_ids according to inputs_embeds. | ||
if input_ids is None and inputs_embeds is not None: | ||
input_ids = self.prepare_input_ids_for_generation(1, inputs_embeds) | ||
if inputs_embeds is not None: | ||
batch, seq_len, hidden_dim = inputs_embeds.shape | ||
inputs_embeds = inputs_embeds.reshape([batch * seq_len, hidden_dim]) | ||
model_kwargs["inputs_embeds"] = inputs_embeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前 llama 模型是支持传入 inputs_embeds 参数,所以你这边就直接塞到 model_inputs 里面去就行了,让他传入到模型里面去。
@@ -83,6 +82,122 @@ def to_static(self, output_path: str, config: dict): | |||
model = paddle.jit.to_static(self.generate, input_spec=input_spec) | |||
paddle.jit.save(model, output_path) | |||
|
|||
# this function make generate_with_image_features to static inference model. | |||
def generate_with_image_features_to_static(self, output_path: str, config: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还是建议直接用to_static 这个方法,避免重复性的工作,调整起来也是比较简单:在 llama for causallm 类里面重写一些 to_static 的方法,避免将这种逻辑上升到通用函数中去。
], # cache_kvs | ||
] | ||
|
||
model = paddle.jit.to_static(self.generate_with_image_features, input_spec=input_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.self.generate_with_image_features 可以通过 config.get("generate_method", self.generate) 这种方式来,这样外面也可以实现自动化配置。
llm/export_llama_for_minigpt4.py
Outdated
predictor.model.generate_with_image_features_to_static( | ||
get_infer_model_path(export_args.output_path, predictor_args.model_prefix), {"dtype": predictor_args.dtype} | ||
) | ||
predictor.model.config.save_pretrained(export_args.output_path) | ||
predictor.tokenizer.save_pretrained(export_args.output_path) | ||
generate_rank_mapping(os.path.join(export_args.output_path, "rank_mapping.csv")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件其实是可以和 export_model.py 融合到一起的。
此外,你 update develop branch 把,我看你代码的版本有些 delay 了。 |
da0a1de
to
bee52b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这里的调整非常好,可是有如下小问题,
此外,等 #6923 合入之后,建议使用 model_type
来代替 llm_for_img2txt
llm/predictor.py
Outdated
if predictor_args.llm_for_img2txt: | ||
# we use llama for img2txt. | ||
from paddlenlp.experimental.transformers import ( | ||
LlamaForminiGPT4InferenceModel as LlamaInferenceModel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LlamaForminiGPT4InferenceModel as LlamaInferenceModel, | |
LlamaForMiniGPT4InferenceModel as LlamaInferenceModel, |
if input_ids is not None and inputs_embeds is not None: | ||
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") | ||
elif input_ids is None and inputs_embeds is None: | ||
raise ValueError("You have to specify either input_ids or inputs_embeds") | ||
|
||
# genereate a fake input_ids according to inputs_embeds. | ||
if input_ids is None and inputs_embeds is not None: | ||
input_ids = self.prepare_input_ids_for_generation(self.config.bos_token_id, inputs_embeds) | ||
if inputs_embeds is not None: | ||
batch, seq_len, hidden_dim = inputs_embeds.shape | ||
inputs_embeds = inputs_embeds.reshape([batch * seq_len, hidden_dim]) | ||
model_kwargs["inputs_embeds"] = inputs_embeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块的逻辑是需要迁移到模型的 forward 里面去的,而不是在 generation_utils 里面,具体可参考:https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/llama/modeling.py#L1189
在 experimental/transformers/llama/modeling.py 下面目前是没有对应的 checking,所以建议你将这部分的代码挪过去一下,非常感谢。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块的逻辑是需要迁移到模型的 forward 里面去的,而不是在 generation_utils 里面,具体可参考:https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/llama/modeling.py#L1189
在 experimental/transformers/llama/modeling.py 下面目前是没有对应的 checking,所以建议你将这部分的代码挪过去一下,非常感谢。
已改,辛苦review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
PR changes
Description
在 modeling.py 中新加了一个类LlamaForminiGPT4InferenceModel,
修改了部分代码,使得paddlenlp/experimental/transformers/generation_utils.py的generate函数支持input_ids为None,inputs_embeds不None的情形。
用户可用这个文件 PaddleNLP/llm/export_model.py,用这个命令
python3.8 export_model.py --model_name_or_path /zhoukangkang/2023-06-06minigpt/whole_part/llama-13b-fp16/ --output_path /zhoukangkang/2023-06-06minigpt/whole_part/miniGPT4-second-part_kaiyuan_fp16 --dtype float16 --inference_model --model_prefix=llama --model_type=llama-img2txt --max_batch_size=2 > out.txt
导出miniGPT4中的语言模型的静态图