Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChatGlm] Adds support for the ChatGLM model #27883

Closed
wants to merge 35 commits into from
Closed

Conversation

ArthurZucker
Copy link
Collaborator

What does this PR do?

Drafts the support of chat GLM

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@changwangss
Copy link
Contributor

Hi~ o( ̄▽ ̄)ブ Could you tell me what your plans? May I know when the PR can be merged? I have PRs depended on this feature.

@ArthurZucker
Copy link
Collaborator Author

@younesbelkada and I just came back from holidays, we are hoping end of the week maybe later!

Comment on lines 540 to 541
"ChatGlmModel is using ChatGlmSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be split in two lines

src/transformers/models/chatglm/modeling_chatglm.py Outdated Show resolved Hide resolved
@younesbelkada
Copy link
Contributor

cc @ArthurZucker could you give a first pass ? 🙏

Copy link
Collaborator Author

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits here and there, ready otherwise! cc @younesbelkada

@@ -0,0 +1,55 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers need an update


Tips:

- TODO conversion tips if needed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove?

@@ -0,0 +1,60 @@
# Copyright 2023 EleutherAI and The HuggingFace Inc. team. All rights reserved.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

rot_shape = cos.shape

# In the original ChatGLM repository the query and key states are manually
# reshaped into a shape `batch_size, num_q_heads, seq_len, head_dim // 2, 2` changing the order
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you think that we can do this when converting the checkpoints?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we can move this complexity into the conversion script, and simplify the implementation here.

Comment on lines +355 to +360
if not self.multi_query_attention:
qkv_size = 3 * self.hidden_size
self.num_key_value_heads = 1
else:
self.num_key_value_heads = config.multi_query_group_num
qkv_size = self.projection_size + 2 * self.hidden_size_per_attention_head * self.num_key_value_heads
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just have a general implementation! supporting GQA means you can have MQA, MHA and GQA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative in my mind: We probably can split the QKV matrix in conversion scripts as well, so that the inference code here do not need to handle the division of QKV matrix. Also separated Q, K, V are more friendly to some specific platforms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have time to work on the addition next week, with a new cool feature that will help ease the integration! 🤗

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Arthur! How's everything going?

GLM has already released GLM-4 model (slightly different with ChatGLM-3 on tokenizer but others are basically same). Could we resume the integration effort? Thanks!

else:
raise ValueError(f"Unknown RoPE scaling type {scaling_type}")

def _split_heads(self, fused_qkv: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need if else

@huggingface huggingface deleted a comment from github-actions bot Feb 19, 2024
@amyeroberts amyeroberts added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Mar 14, 2024
@huggingface huggingface deleted a comment from github-actions bot Apr 16, 2024
@ArthurZucker
Copy link
Collaborator Author

I think #33823 should fix this!

@ArthurZucker ArthurZucker deleted the add-chat-glm branch October 5, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants