Mixtral enablement. #120

wang2yn84 · 2024-06-10T21:38:57Z

Mixtral 8x7b model is working for both offline and online, bf16 and int8. Let's get this in first so we can parallelize the work. Will add tests in the coming PRs.

…t's moving. But the outputs doesn't make sense yet because weights are not loaded yet.

…verter with qkv fusion.

…or loading pth file.

quantization

… logging.

qihqi · 2024-06-10T21:51:24Z

please make sure the name is mixtral and not mistral. We might add mistral 7b ( the non-Moe version) later, so it would be confusing

qihqi · 2024-06-10T21:48:46Z

README.md

 ## Run weight safetensor convert

 ```bash
 export input_ckpt_dir=Original llama weights directory
 export output_ckpt_dir=The output directory
-export model_name="llama-3" # or "llama-2", "gemma"
+export model_name="llama-3" # or "llama-2", "gemma", "mistral"


change this to mixtral

Thanks. I was confused about the name initially and that's why there are mixes of mistral and mixtral. I also changes everything to Mixtral. Done.

qihqi · 2024-06-10T21:49:59Z

jetstream_pt/third_party/mistral/model_original.py

+        torch.empty(config.num_experts, config.intermediate_size, config.dim)
+    )
+
+  def forward(self, x: Tensor, expert_indices: Tensor) -> Tensor:


I had a change to use different logic for longer seqlen and i pushed to your branch, is that lost from merging?

also the quantized change

This is the original model. Your changes are in model.py

FanhaiLu1

Thanks for add Mixtral, the code is clean and overlay look good!

FanhaiLu1 · 2024-06-10T23:03:50Z

convert_checkpoints.py

+      "layers.{}.attention.wk.weight": "layers.{}.attention.wk.weight",
+      "layers.{}.attention.wv.weight": "layers.{}.attention.wv.weight",
+      "layers.{}.attention.wo.weight": "layers.{}.attention.wo.weight",
+      "layers.{}.block_sparse_moe.w1": "layers.{}.block_sparse_moe.cond_ffn.w1",


Looks like only these weight name are difference, can we only store the the different name in the map?

Good point, removed

…tral checkpoints.

wang2yn84 and others added 13 commits June 10, 2024 20:44

Initial Mixtral enablement.

c5c0772

Adds the mistral tokenizer model.

277b02c

Updates the convert checkpoint file to handle mistral model.

8be8c36

Renames the typo of the model name.

ab8c802

Fixing checkpoing loading. Still has some issue. Push to debug.

98fdf71

Running on CPU working, temporarily disable the generate jit to see i…

aa78dbd

…t's moving. But the outputs doesn't make sense yet because weights are not loaded yet.

Fix checkpoint loading issue. Right now loading from the gpt-fast con…

1846cf9

…verter with qkv fusion.

Fix the ckpt conversion script for mistral model. Fix the freqs_cis f…

8be6cfc

…or loading pth file.

Add quantized layer for moe

e6c5696

quantization

Add the huggingface download script. Improved the convert checkpoints…

518f758

… logging.

Clean up and fix lint errors.

147cef4

Missing cleanups.

2ed3e0e

Add instructions for Mixtral.

14d6672

wang2yn84 requested review from qihqi and FanhaiLu1 June 10, 2024 21:38

qihqi reviewed Jun 10, 2024

View reviewed changes

qihqi self-requested a review June 10, 2024 21:51

wang2yn84 added 2 commits June 10, 2024 22:00

Renames everything from mistral to mixtral.

b2a6a18

Fix more lint errors.

c004adc

FanhaiLu1 reviewed Jun 10, 2024

View reviewed changes

Removes the unnecessary checkpoint name mapping from the original Mix…

a206374

…tral checkpoints.

qihqi approved these changes Jun 11, 2024

View reviewed changes

FanhaiLu1 approved these changes Jun 11, 2024

View reviewed changes

Fix the model calling arg sequence; Fix the checkpoint convert script.

d18f8c3

wang2yn84 merged commit d6bf068 into main Jun 11, 2024
4 checks passed

qihqi deleted the mixtral branch July 15, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral enablement. #120

Mixtral enablement. #120

wang2yn84 commented Jun 10, 2024

qihqi commented Jun 10, 2024

qihqi Jun 10, 2024

wang2yn84 Jun 10, 2024

qihqi Jun 10, 2024

qihqi Jun 10, 2024

wang2yn84 Jun 10, 2024

FanhaiLu1 left a comment

FanhaiLu1 Jun 10, 2024

wang2yn84 Jun 10, 2024

Mixtral enablement. #120

Mixtral enablement. #120

Conversation

wang2yn84 commented Jun 10, 2024

qihqi commented Jun 10, 2024

qihqi Jun 10, 2024

Choose a reason for hiding this comment

wang2yn84 Jun 10, 2024

Choose a reason for hiding this comment

qihqi Jun 10, 2024

Choose a reason for hiding this comment

qihqi Jun 10, 2024

Choose a reason for hiding this comment

wang2yn84 Jun 10, 2024

Choose a reason for hiding this comment

FanhaiLu1 left a comment

Choose a reason for hiding this comment

FanhaiLu1 Jun 10, 2024

Choose a reason for hiding this comment

wang2yn84 Jun 10, 2024

Choose a reason for hiding this comment