Mistral fused modules #90

casper-hansen · 2023-10-02T17:57:06Z

The model seems to work quite well with fused modules, the outputs are almost 1-to-1 the same as without fused layers, generation just happens much faster.

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer, GenerationConfig

quant_path = "TheBloke/Mistral-7B-OpenOrca-AWQ"

# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Convert prompt to tokens
prompt_template = """\
<|im_start|>system
You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""

tokens = tokenizer(
    prompt_template.format(prompt="Why is ice cream so good, yes so good?"), 
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=512,
    eos_token_id=32000
)

Benchmarks on low-end GPU:

GPU: NVIDIA RTX A5000
Model: TheBloke/Mistral-7B-OpenOrca-AWQ
Version: GEMM

Batch Size	Prefill Length	Decode Length	Prefill tokens/s	Decode tokens/s	Memory (VRAM)
1	32	32	333.275	104.39	4.27 GB (18.04%)
1	64	64	1200.73	104.237	4.28 GB (18.09%)
1	128	128	1756.7	104.056	4.29 GB (18.13%)
1	256	256	1943.99	103.138	4.31 GB (18.21%)
1	512	512	1918.02	101.209	4.35 GB (18.37%)
1	1024	1024	1871.62	97.5771	4.83 GB (20.40%)
1	2048	2048	1693.59	89.9602	6.42 GB (27.10%)

casper-hansen added 2 commits October 2, 2023 17:32

Mistral fused modules

d5bb4ec

Support safetensors in benchmark

92579e9

casper-hansen merged commit 11efba0 into main Oct 2, 2023

casper-hansen deleted the mistral_fused branch October 2, 2023 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral fused modules #90

Mistral fused modules #90

casper-hansen commented Oct 2, 2023 •

edited

Loading

Mistral fused modules #90

Mistral fused modules #90

Conversation

casper-hansen commented Oct 2, 2023 • edited Loading

casper-hansen commented Oct 2, 2023 •

edited

Loading