[WIP, Kernel] (2/N) Machete - Integrate into GPTQMarlinLinearMethod and CompressedTensorsWNA16 #5

LucasWilkinson · 2024-08-07T23:10:37Z

End2end integration into GPTQMarlinLinearMethod and CompressedTensorsWNA16.

github-actions · 2024-08-07T23:10:51Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

move heuristic into C++ code fix unit tests + format update for 3.5.1 remove custom scheduler codespell cleanup comment cleanup diff review comments review comments review comment changes review comments fix codespell cleanup util logic make dim names for prepack layout more canoncial missed refactor wip interleaving + recasting tweak tolerances comments plus interleaving format codespell review comments end2end first pass seperate out kernels, format add machete as a gptq backend update to use ModelWeightParameter formatting update parameter.py refactor permute layout wip

…d2end

)

…llm-project#8643)

LucasWilkinson force-pushed the lwilkinson/machete branch 2 times, most recently from 8b21235 to 2868f5d Compare August 12, 2024 03:50

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from 11a9dec to 529c59e Compare August 12, 2024 15:17

LucasWilkinson force-pushed the lwilkinson/machete branch from e9c70f8 to 6955a93 Compare August 14, 2024 18:37

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from 529c59e to 0bcd9c1 Compare August 14, 2024 19:17

LucasWilkinson force-pushed the lwilkinson/machete branch 5 times, most recently from 60021df to e92b26e Compare August 15, 2024 15:03

LucasWilkinson changed the base branch from lwilkinson/machete to main August 15, 2024 21:23

LucasWilkinson changed the base branch from main to lwilkinson/machete August 15, 2024 21:23

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from df80f72 to 57a8011 Compare August 15, 2024 21:25

LucasWilkinson force-pushed the lwilkinson/machete branch 3 times, most recently from a280110 to ad5771a Compare August 20, 2024 03:00

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from 57a8011 to d2431a7 Compare August 20, 2024 19:05

LucasWilkinson changed the base branch from lwilkinson/machete to main August 20, 2024 19:06

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch 3 times, most recently from 1361be1 to d5ee5b8 Compare August 20, 2024 19:16

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from d5ee5b8 to 735259b Compare August 30, 2024 20:54

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch 2 times, most recently from 953973d to 90f8bb6 Compare September 10, 2024 21:08

LucasWilkinson added 5 commits September 13, 2024 22:21

remove gptq support

1ee3608

formatting + fixes

ab7507e

add gptq_marlin support back

68ff26d

remove extra prints

7b9e8b2

LucasWilkinson added 14 commits September 13, 2024 22:21

add machete act ordering

30f1056

udpate heuristic

3bbb902

add to tests

196a9f2

update benchmark

38f5b84

tweak for llama 405b

c59449b

env var for disabling kernels

3048911

format + mypy

df7c4c0

yapf format

6f3f707

refactor

90b8e03

add g_idx back

c264c7a

clean-up

2d25a9a

review comments

62508c5

fix codespell

84cfdb2

TorchDynamo Compatability

c452a86

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from 5a38aac to 7bc8316 Compare September 13, 2024 22:44

add permute cols opcheck

096dd4a

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from 7bc8316 to 096dd4a Compare September 13, 2024 22:47

fix correctness test

a98f691

LucasWilkinson force-pushed the lwilkinson/machete-end2end branch from ac45f19 to a98f691 Compare September 16, 2024 04:22

LucasWilkinson and others added 6 commits September 16, 2024 15:20

bug in filtering kernels by compute capability

7c02bcf

Merge remote-tracking branch 'origin/main' into lwilkinson/machete-en…

95a85c9

…d2end

add requirements.txt

a019473

Merge branch 'main' into lwilkinson/machete-end2end

306b283

[dbrx] refactor dbrx experts to extend FusedMoe class (vllm-project#8518

e32bfc5

)

[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (v…

05752e9

…llm-project#8643)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP, Kernel] (2/N) Machete - Integrate into GPTQMarlinLinearMethod and CompressedTensorsWNA16 #5

[WIP, Kernel] (2/N) Machete - Integrate into GPTQMarlinLinearMethod and CompressedTensorsWNA16 #5

LucasWilkinson commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

[WIP, Kernel] (2/N) Machete - Integrate into GPTQMarlinLinearMethod and CompressedTensorsWNA16 #5

Are you sure you want to change the base?

[WIP, Kernel] (2/N) Machete - Integrate into GPTQMarlinLinearMethod and CompressedTensorsWNA16 #5

Conversation

LucasWilkinson commented Aug 7, 2024

github-actions bot commented Aug 7, 2024