[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

Qubitium · 2024-06-28T06:48:06Z

Resolves #100

Qubitium · 2024-06-28T20:00:20Z

Confirmed my suspicion that padding code will impact the shape of saved tensors to disk. Pack has been fixed so that the saved size is correct (original) but now running into load issues since the expanded/padded buffers are now larger than the tensors on disk so accelerate is throwing shape not same errors during qunt model load. We will explore 2 methods to deal with this tomorrow.

Plan A: monkeypatch accelerate so load is using tensor indexing/slicing so as long as dst.size > src.size, we will use dst[:src.size] = src to copy the smaller tensor over.
Plan B: refractor qlinear init to generate original buffers only, load from disk, then expand/pad the loaded tensors in post_init

…size in post_init

Qubitium · 2024-06-29T10:53:24Z

Update: We are going with Plan B.

…lin (ModelCloud#98) * fix padding * fix padding * store original in/out features * fix bad var reference * shorter var name * limit bitblas convert to use 1 thread * ruff * fix qlinear_exllama pack * revert qliner_marlin change * cleanup code * plan b: init with original shape, then model load, then do padding/resize in post_init * fix g_idx post_init * const var reformat to all caps * fix ( -> [ * padding the x that passes in forward * comments/todo * comments --------- Co-authored-by: LRL-ModelCloud <[email protected]>

Qubitium added 4 commits June 28, 2024 06:39

fix padding

8031585

fix padding

44b97a6

store original in/out features

6717991

fix bad var reference

6909080

Qubitium force-pushed the padding branch from a8425ff to 6909080 Compare June 28, 2024 12:23

Qubitium and others added 6 commits June 28, 2024 12:34

shorter var name

3289743

limit bitblas convert to use 1 thread

1c064fd

ruff

12552e4

fix qlinear_exllama pack

c4b9ea2

revert qliner_marlin change

8670dc4

cleanup code

f37b4ae

LRL-ModelCloud force-pushed the padding branch from 57b3f6d to f37b4ae Compare June 28, 2024 18:13

Qubitium and others added 7 commits June 28, 2024 22:13

plan b: init with original shape, then model load, then do padding/re…

d78d75f

…size in post_init

Merge branch 'main' into padding

776b620

Merge branch 'main' into padding

3b57766

fix g_idx post_init

105b6a9

const var reformat to all caps

a96ef4d

fix ( -> [

90f3f54

padding the x that passes in forward

4189817

Qubitium added 2 commits June 29, 2024 10:58

comments/todo

32dec09

comments

3c2ba94

Qubitium merged commit e526cce into main Jun 29, 2024
1 of 2 checks passed

Qubitium deleted the padding branch June 29, 2024 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

Qubitium commented Jun 28, 2024 •

edited

Loading

Qubitium commented Jun 28, 2024 •

edited

Loading

Qubitium commented Jun 29, 2024

[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

Conversation

Qubitium commented Jun 28, 2024 • edited Loading

Qubitium commented Jun 28, 2024 • edited Loading

Qubitium commented Jun 29, 2024

Qubitium commented Jun 28, 2024 •

edited

Loading

Qubitium commented Jun 28, 2024 •

edited

Loading