Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[GPTQ Enhence] Add GPTQ int8 weight unpack function #184

Merged
merged 6 commits into from
Mar 22, 2024

Conversation

Zhenzhong1
Copy link
Contributor

Type of Change

Feature Enhence

Description

Add int8 unpack weight to support int8 GPTQ

Expected Behavior & Potential Risk

N/A

How has this PR been tested?

Manually tests.

Dependency Change?

N/A

@Zhenzhong1 Zhenzhong1 marked this pull request as draft March 21, 2024 07:41
@Zhenzhong1 Zhenzhong1 marked this pull request as ready for review March 21, 2024 07:56
@Zhenzhong1 Zhenzhong1 marked this pull request as draft March 21, 2024 07:56
@Zhenzhong1 Zhenzhong1 marked this pull request as ready for review March 22, 2024 01:34
@Zhenzhong1 Zhenzhong1 changed the title [GPTQ Enhence] Add int8 unpack weight to support int8 GPTQ [GPTQ Enhence] Add unpack int8 weight for GPTQ Mar 22, 2024
neural_speed/convert/convert_quantized_llama.py Outdated Show resolved Hide resolved
neural_speed/convert/common.py Outdated Show resolved Hide resolved
neural_speed/convert/common.py Show resolved Hide resolved
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VincyZhang VincyZhang merged commit ed6e8ad into main Mar 22, 2024
11 checks passed
@Zhenzhong1 Zhenzhong1 changed the title [GPTQ Enhence] Add unpack int8 weight for GPTQ [GPTQ Enhence] Add GPTQ int8 weight unpack function Mar 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants