Is there any perplexity data for using 16bit vs 32bit memory? #1593

KerfuffleV2 · 2023-05-25T08:06:03Z

KerfuffleV2
May 25, 2023
Collaborator

I'm talking about --memory-f32 for the main example.

It seems like the general consensus is that there's no noticeable difference for actual models. In fact, based on the quantization section in the README there's virtually no difference between 16bit and Q8_0.

Answered by KerfuffleV2

May 25, 2023

I did some of my own testing. The conclusion looks to be that there is no effective difference. It doesn't seem like there's a case for ever using --memory-f32.

Not sure if it makes a difference but the test ran with cuBLAS enabled and offloading some layers.

Running the perplexity calculation on LLaMA 7b Q4_0:

32bit memory

[1]4.4544,[2]4.9400,[3]5.8279,[4]6.4844,[5]6.5856,[6]6.5088,[7]6.6927,[8]6.8060,[9]7.1427,[10]7.3866

[...]

[207]6.1957,[208]6.2042,[209]6.2087,[210]6.2146,[211]6.2247,[212]6.2317,[213]6.2420,[214]6.2449,[215]6.2478,[216]6.2612

16bit memory

[1]4.4544,[2]4.9400,[3]5.8279,[4]6.4844,[5]6.5856,[6]6.5088,[7]6.6927,[8]6.8060,[9]7.1427,[10]7.3866

[...]

[207]6.1957,[208]6.2042,…

View full answer

KerfuffleV2 · 2023-05-25T09:06:55Z

KerfuffleV2
May 25, 2023
Collaborator Author

I did some of my own testing. The conclusion looks to be that there is no effective difference. It doesn't seem like there's a case for ever using --memory-f32.

Not sure if it makes a difference but the test ran with cuBLAS enabled and offloading some layers.

Running the perplexity calculation on LLaMA 7b Q4_0:

32bit memory

[1]4.4544,[2]4.9400,[3]5.8279,[4]6.4844,[5]6.5856,[6]6.5088,[7]6.6927,[8]6.8060,[9]7.1427,[10]7.3866

[...]

[207]6.1957,[208]6.2042,[209]6.2087,[210]6.2146,[211]6.2247,[212]6.2317,[213]6.2420,[214]6.2449,[215]6.2478,[216]6.2612

16bit memory

[1]4.4544,[2]4.9400,[3]5.8279,[4]6.4844,[5]6.5856,[6]6.5088,[7]6.6927,[8]6.8060,[9]7.1427,[10]7.3866

[...]

[207]6.1957,[208]6.2042,[209]6.2087,[210]6.2146,[211]6.2247,[212]6.2317,[213]6.2420,[214]6.2449,[215]6.2478,[216]6.2612

I also did a very short test with the Q8_0 version just to see if maybe the difference was lost in the noise possibly caused by lower quality quantization:

32bit memory

[1]4.2284,[2]4.7007,[3]5.5711,[4]6.1757,[5]6.2967,[6]6.2677,[7]6.4631,[8]6.5548,[9]6.8742,[10]7.1204,[11]7.3161,[12]7.3371,[13]7.2474,[14]7.2943,[15]7.5318,[16]7.1632,[17]7.0561,[18]7.0044,[19]6.6580,[20]6.6455,

16bit memory

[1]4.2285,[2]4.7009,[3]5.5714,[4]6.1760,[5]6.2969,[6]6.2679,[7]6.4635,[8]6.5551,[9]6.8744,[10]7.1206,[11]7.3163,[12]7.3373,[13]7.2475,[14]7.2944,[15]7.5320,[16]7.1634,[17]7.0563,[18]7.0046,[19]6.6582,[20]6.6457

1 reply

ggerganov May 25, 2023
Maintainer

Yes, previous experiments also showed that there is no difference between F16 and F32 KV memory

KerfuffleV2 · 2023-05-25T19:44:40Z

KerfuffleV2
May 25, 2023
Collaborator Author

there is no difference between F16 and F32 KV memory

I guess the next question would be: Is there any reason to keep the --memory-f32 option? The only thing I can think of is if someone really needs deterministic generation and has been using --memory-f32 then removing it would mess up their results.

There should probably at least be an indication for the user that it doesn't actually increase quality in a way that's even measurable but uses twice as much memory.

0 replies

SlyEcho · 2023-05-26T07:36:08Z

SlyEcho
May 26, 2023
Collaborator Sponsor

I have been using --memory_f32 when generating longer texts with full context (2048) sizes. Subjectively it seems to avoid repeating itself too much while the performance impact seems minimal.

0 replies

KerfuffleV2 · 2023-05-26T14:14:27Z

KerfuffleV2
May 26, 2023
Collaborator Author

Subjectively it seems to avoid repeating itself too much while the performance impact seems minimal.

The fact that there's no measurable difference in perplexity makes me inclined to say this is most likely confirmation bias/placebo effect.

If there's a real difference it should be measurable.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any perplexity data for using 16bit vs 32bit memory? #1593

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is there any perplexity data for using 16bit vs 32bit memory? #1593

KerfuffleV2 May 25, 2023 Collaborator

32bit memory

16bit memory

Replies: 4 comments · 1 reply

KerfuffleV2 May 25, 2023 Collaborator Author

32bit memory

16bit memory

32bit memory

16bit memory

ggerganov May 25, 2023 Maintainer

KerfuffleV2 May 25, 2023 Collaborator Author

SlyEcho May 26, 2023 Collaborator Sponsor

KerfuffleV2 May 26, 2023 Collaborator Author

KerfuffleV2
May 25, 2023
Collaborator

Replies: 4 comments 1 reply

KerfuffleV2
May 25, 2023
Collaborator Author

ggerganov May 25, 2023
Maintainer

KerfuffleV2
May 25, 2023
Collaborator Author

SlyEcho
May 26, 2023
Collaborator Sponsor

KerfuffleV2
May 26, 2023
Collaborator Author