Bug report: tensor on different gpu when use defulat device_map #194

yyfcc17 · 2023-11-15T10:34:35Z

in quantizer.py, the .cuda() method use gpu 0, when a block is placed on other gpu, like gpu 1, this will lead to the error.

have you tested on multi-gpus?

or when quantization we only use one gpu is enough? move one block from cpu to gpu 0, then move it back to cpu after the quantization of this block is done?

casper-hansen · 2023-11-15T10:47:50Z

Do you have a trace back? I have tested multi-GPU. The .cuda() method is supposed to move the tensor to the current GPU as opposed to doing .to() which specifies which GPU.

yyfcc17 · 2023-11-15T11:02:32Z

here is my case:

at the init, i have a block dispatched to gpu 1,

when .cuda() at this line: https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/quantizer.py#L73

the block's parameters are moved to gpu 0, (infered: but the input tensor seems to be still on gpu 1),

even if in _get_input_feat in quantizer.py the inps also moved to the block's parameters device (which is now gpu 0),

after running: https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/quantizer.py#L355

it will run into error, i have checked, it is because the parameter of layernorm (in this block) is on gpu 0, but the input tensor is on gpu 1.

yyfcc17 · 2023-11-15T11:35:13Z

Maybe the problem is why .cuda() moves a block from gpu 1 to gpu 0, rather than do nothing.

And what does current device mean in pytorch doc?

casper-hansen · 2023-11-15T11:46:24Z

This may just come down to how the model is initialized. Which version of AutoAWQ are you using? I pushed a big change for multi-GPU yesterday to the main branch

yyfcc17 · 2023-11-15T12:46:20Z

git log last commit: Fix multi-GPU loading and inference, which is your yesterday version.

the model i try to support is chatglm2-6b, i don't know if this line in chatglm2-6b modeling will cause the problem:
https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L349

after all, after my workaround of this issue, the model was quantized to int4, and the inference seems good. thanks for your work.

i tried the official llm-awq first, the kernel implementation seems to have some bug. your kernel implementation runs smooth right now 👍

casper-hansen · 2023-11-15T20:46:26Z

I just fixed this! #196

I found that .cuda() was a problem just like you said. I thought it would have been moved correctly, but .cuda() causes everything to be on cuda:0 (I do not understand why though).

user-ZJ · 2024-01-17T10:07:22Z

git log last commit: Fix multi-GPU loading and inference, which is your yesterday version.

the model i try to support is chatglm2-6b, i don't know if this line in chatglm2-6b modeling will cause the problem: https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L349

after all, after my workaround of this issue, the model was quantized to int4, and the inference seems good. thanks for your work.

i tried the official llm-awq first, the kernel implementation seems to have some bug. your kernel implementation runs smooth right now 👍

can you shared the code added chatglm2-6b model

yyfcc17 closed this as completed Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report: tensor on different gpu when use defulat device_map #194

Bug report: tensor on different gpu when use defulat device_map #194

yyfcc17 commented Nov 15, 2023

casper-hansen commented Nov 15, 2023

yyfcc17 commented Nov 15, 2023

yyfcc17 commented Nov 15, 2023

casper-hansen commented Nov 15, 2023

yyfcc17 commented Nov 15, 2023 •

edited

Loading

casper-hansen commented Nov 15, 2023

user-ZJ commented Jan 17, 2024

Bug report: tensor on different gpu when use defulat device_map #194

Bug report: tensor on different gpu when use defulat device_map #194

Comments

yyfcc17 commented Nov 15, 2023

casper-hansen commented Nov 15, 2023

yyfcc17 commented Nov 15, 2023

yyfcc17 commented Nov 15, 2023

casper-hansen commented Nov 15, 2023

yyfcc17 commented Nov 15, 2023 • edited Loading

casper-hansen commented Nov 15, 2023

user-ZJ commented Jan 17, 2024

yyfcc17 commented Nov 15, 2023 •

edited

Loading