Skip to content

Is there any perplexity data for using 16bit vs 32bit memory? #1593

Answered by KerfuffleV2
KerfuffleV2 asked this question in Q&A
Discussion options

You must be logged in to vote

I did some of my own testing. The conclusion looks to be that there is no effective difference. It doesn't seem like there's a case for ever using --memory-f32.

Not sure if it makes a difference but the test ran with cuBLAS enabled and offloading some layers.

Running the perplexity calculation on LLaMA 7b Q4_0:

32bit memory

[1]4.4544,[2]4.9400,[3]5.8279,[4]6.4844,[5]6.5856,[6]6.5088,[7]6.6927,[8]6.8060,[9]7.1427,[10]7.3866

[...]

[207]6.1957,[208]6.2042,[209]6.2087,[210]6.2146,[211]6.2247,[212]6.2317,[213]6.2420,[214]6.2449,[215]6.2478,[216]6.2612

16bit memory

[1]4.4544,[2]4.9400,[3]5.8279,[4]6.4844,[5]6.5856,[6]6.5088,[7]6.6927,[8]6.8060,[9]7.1427,[10]7.3866

[...]

[207]6.1957,[208]6.2042,…

Replies: 4 comments 1 reply

Comment options

KerfuffleV2
May 25, 2023
Collaborator Author

You must be logged in to vote
1 reply
@ggerganov
Comment options

Answer selected by ggerganov
Comment options

KerfuffleV2
May 25, 2023
Collaborator Author

You must be logged in to vote
0 replies
Comment options

SlyEcho
May 26, 2023
Collaborator Sponsor

You must be logged in to vote
0 replies
Comment options

KerfuffleV2
May 26, 2023
Collaborator Author

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants