Qwen nan fix #522

baoyf4244 · 2024-06-26T07:55:07Z

After quantization, Qwen2-72B can not inference using multi gpu with vllm, so we need to pad intermediate_size from 29568 to 29696 with zeros. But after padding, the nan problem occurs again. This commit fix it.

The details can be found in QwenLM/Qwen2.5#578, it's found in GPTQ, but also occurs in AWQ.

…2-72B model

…vllm, so we need to pad intermediate_size from 29568 to 29696. But after padding, the nan problem occurs again. This commit fix it.

baoyf4244 added 3 commits June 24, 2024 16:19

change weight scaling formulation, fix nan problem when quantize Qwen…

987d604

…2-72B model

After quantization, Qwen2-72B can not inference using multi gpu with …

0b1df3a

…vllm, so we need to pad intermediate_size from 29568 to 29696. But after padding, the nan problem occurs again. This commit fix it.

Merge remote-tracking branch 'origin/qwen_nan_fix' into qwen_nan_fix

0b2abc5

casper-hansen merged commit 35d23db into casper-hansen:main Jun 30, 2024

baoyf4244 deleted the qwen_nan_fix branch July 5, 2024 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen nan fix #522

Qwen nan fix #522

baoyf4244 commented Jun 26, 2024

Qwen nan fix #522

Qwen nan fix #522

Conversation

baoyf4244 commented Jun 26, 2024