Replies: 2 comments 2 replies
-
or, mlx is the same as llama.cpp? |
Beta Was this translation helpful? Give feedback.
-
For the most part there is no real difference between MLX community models and the original Hugging Face model when the precision is fp16, bf16, or fp32. In some cases the model could have a slightly different format but in many cases they are identical. The main difference between MLX Community models is that we keep the quantized models there (4-bit and 8-bit). The quantization format is quite specific to MLX. But there is no rule that quantized models must live in the MLX community. That's just a convenient place to put them if the original model creator didn't make MLX quantized models. |
Beta Was this translation helpful? Give feedback.
-
Qwen2/Qwen2-7B => python -m mlx_lm.convert => mlx-community/Qwen2-7B? is it?
but why i can also run "python -m mlx_lm.generate --model huggingface/llm/Qwen/Qwen2-7B/ --prompt "hello"" from Original model
when i test mlx-example/stable-diffusion, sdxl-turbo is download, but model type like ollama, amazing
So,if i want to run Qwen2-7B, run original model? or convert to Qwen2-7B-MLX like mlx-community/* then run?
Beta Was this translation helpful? Give feedback.
All reactions