Skip to content

Commit

Permalink
[Llama] Add Llama3.2 Instruct 1B 3B to preset (#575)
Browse files Browse the repository at this point in the history
This PR supports Llama3.2-1B and 3B Instruct. Hence we add the following
Llama3.2 models to the prebuilt list:
- `Llama-3.2-1B-Instruct-q4f16_1-MLC`
- `Llama-3.2-1B-Instruct-q4f32_1-MLC`
- `Llama-3.2-1B-Instruct-q0f16-MLC`
- `Llama-3.2-1B-Instruct-q0f32-MLC`
- `Llama-3.2-3B-Instruct-q4f16_1-MLC`
- `Llama-3.2-3B-Instruct-q4f32_1-MLC`
  • Loading branch information
CharlieFRuan authored Sep 25, 2024
1 parent 0b5f405 commit db77ff5
Showing 1 changed file with 79 additions and 1 deletion.
80 changes: 79 additions & 1 deletion src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,85 @@ export const functionCallingModelIds = [
export const prebuiltAppConfig: AppConfig = {
useIndexedDBCache: false,
model_list: [
// Llama-3
{
model: "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC",
model_id: "Llama-3.2-1B-Instruct-q4f32_1-MLC",
model_lib:
modelLibURLPrefix +
modelVersion +
"/Llama-3.2-1B-Instruct-q4f32_1-ctx4k_cs1k-webgpu.wasm",
vram_required_MB: 1128.82,
low_resource_required: true,
overrides: {
context_window_size: 4096,
},
},
{
model: "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f16_1-MLC",
model_id: "Llama-3.2-1B-Instruct-q4f16_1-MLC",
model_lib:
modelLibURLPrefix +
modelVersion +
"/Llama-3.2-1B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
vram_required_MB: 879.04,
low_resource_required: true,
overrides: {
context_window_size: 4096,
},
},
{
model: "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q0f32-MLC",
model_id: "Llama-3.2-1B-Instruct-q0f32-MLC",
model_lib:
modelLibURLPrefix +
modelVersion +
"/Llama-3.2-1B-Instruct-q0f32-ctx4k_cs1k-webgpu.wasm",
vram_required_MB: 5106.26,
low_resource_required: true,
overrides: {
context_window_size: 4096,
},
},
{
model: "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q0f16-MLC",
model_id: "Llama-3.2-1B-Instruct-q0f16-MLC",
model_lib:
modelLibURLPrefix +
modelVersion +
"/Llama-3.2-1B-Instruct-q0f16-ctx4k_cs1k-webgpu.wasm",
vram_required_MB: 2573.13,
low_resource_required: true,
overrides: {
context_window_size: 4096,
},
},
{
model: "https://huggingface.co/mlc-ai/Llama-3.2-3B-Instruct-q4f32_1-MLC",
model_id: "Llama-3.2-3B-Instruct-q4f32_1-MLC",
model_lib:
modelLibURLPrefix +
modelVersion +
"/Llama-3.2-3B-Instruct-q4f32_1-ctx4k_cs1k-webgpu.wasm",
vram_required_MB: 2951.51,
low_resource_required: true,
overrides: {
context_window_size: 4096,
},
},
{
model: "https://huggingface.co/mlc-ai/Llama-3.2-3B-Instruct-q4f16_1-MLC",
model_id: "Llama-3.2-3B-Instruct-q4f16_1-MLC",
model_lib:
modelLibURLPrefix +
modelVersion +
"/Llama-3.2-3B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
vram_required_MB: 2263.69,
low_resource_required: true,
overrides: {
context_window_size: 4096,
},
},
// Llama-3.1
{
model: "https://huggingface.co/mlc-ai/Llama-3.1-8B-Instruct-q4f32_1-MLC",
model_id: "Llama-3.1-8B-Instruct-q4f32_1-MLC-1k",
Expand Down

2 comments on commit db77ff5

@flatsiedatsie
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome.

Out of curiosity: what does Q0 mean? Unquantified?

@flatsiedatsie
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The huggingface folder is called Llama-3.2-1B-Instruct-q4f16_0-MLC instead of /Llama-3.2-1B-Instruct-q4f16_1-MLC?

Oh wait, no, they both exist.

What is the difference between 0 and 1 again?

Please sign in to comment.