Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable ModelLoaderHuggerFace to support loading models in fp16 for inference #555

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

0x404
Copy link
Contributor

@0x404 0x404 commented Sep 22, 2024

ModelLoaderHuggerFace currently only supports reading tensors from a checkpoint and loading them into the model, while keeping the tensor dtype as it is.

This PR adds an fp16_inference option, allowing ModelLoaderHuggerFace to load fp16 models for fp16 inference.

0x404 and others added 6 commits September 22, 2024 08:38
@ShawnXuan
Copy link
Contributor

加载模型后再转为fp16,内存会突然减小很多。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants