fix: repack for marlin when single scale is provided #2414

drbh · 2024-08-13T20:55:29Z

This PR adjust the conditional for repacking fp8 for marlin to run when a single scale is provided. This avoids a IndexError in the case that scales only contain a single value.

not ~~related to: #2388~~

Narsil · 2024-08-14T09:47:25Z

server/text_generation_server/layers/marlin/fp8.py

@@ -39,7 +39,8 @@ def __init__(
        log_once(logger.info, "GPU does not support FP8, using Marlin FP8 kernel")

        scales = scales.unsqueeze(0)
-        if scales.shape[1] == 1:
+        # repack weights for Marlin if a single scale is provided
+        if scales.size(0) == 1:


Suggested change

if scales.size(0) == 1:

if scales.shape[0] == 1:

Can you explain where this change is coming from ?

yea apologies for not including an example above. Currently if attempting to quantize an unquantized model with marlin fp8 the line above will throw when attempting to use marlin to repack.

text-generation-launcher --model-id meta-llama/Meta-Llama-3-8B --quantize fp8

with the change above this model loads and generates as expected

curl 127.0.0.1:3000/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json' # {"generated_text":" Deep learning is a subset of machine learning that is inspired by the structure and function of the human brain"}

Narsil · 2024-08-14T09:55:48Z

This change doesn' seem to fix neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 for me

Narsil · 2024-08-29T14:54:56Z

I'm confused this still doesn't fix neural magic.

text-generation-launcher --model-id meta-llama/Meta-Llama-3-8B --quantize fp8

is currently working on main. This might have been fixed by something else ?

Can we introduce a failing test before fixing this ?

Narsil · 2024-10-01T14:43:45Z

Closing as stale, feel free to reopen.

fix: repack for marlin when single scale is provided

ab4d480

Narsil reviewed Aug 14, 2024

View reviewed changes

fix: improve scales change and revert conditional

4b10c8c

Narsil closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: repack for marlin when single scale is provided #2414

fix: repack for marlin when single scale is provided #2414

drbh commented Aug 13, 2024 •

edited

Loading

Narsil Aug 14, 2024

drbh Aug 14, 2024 •

edited

Loading

Narsil commented Aug 14, 2024

Narsil commented Aug 29, 2024

Narsil commented Oct 1, 2024

fix: repack for marlin when single scale is provided #2414

fix: repack for marlin when single scale is provided #2414

Conversation

drbh commented Aug 13, 2024 • edited Loading

Narsil Aug 14, 2024

Choose a reason for hiding this comment

drbh Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

Narsil commented Aug 14, 2024

Narsil commented Aug 29, 2024

Narsil commented Oct 1, 2024

drbh commented Aug 13, 2024 •

edited

Loading

drbh Aug 14, 2024 •

edited

Loading