[CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg #68

ZX-ModelCloud · 2024-06-26T04:48:44Z

Resolves #59

The following args will be merged into single backed: Backend = Backend.AUTO

use_triton: bool,
disable_exllama: bool = False,
disable_exllamav2: bool = False,
use_marlin: bool = False,
use_bitblas: bool = True,
Reason: It is not only super confusing for users to use correctly (matrix condition of passive binary toggles), even project developers ran into multiple bugs due to these passive switches. We can't keep adding more binary toggles every time we add a backend/kernel/runtime. Becoming unmaintainable and unusable by both end-users and project devs.

Prelim design:

class Backend(ENUM):
AUTO # choose the fastest one based on quant model compatibility
CUDA_OLD
CUDA
TRITON_V2
EXLLAMA
EXLLAMA_V2
MARLIN
BITBLAS

…cking the model.

gptqmodel/utils/importer.py

* Consolidate Backend * change Backend.TRITON_V2 to Backend.TRITON * According to quantize_config.format, determine when the Backend is packing the model. * Auto choose the fastest one Backend based on quant model compatibility * fix issue: Automatically select Backend, returns incorrect qlinear. * cleanup * cleanup

ZX-ModelCloud added 4 commits June 26, 2024 04:47

Consolidate Backend

bb5ccb5

change Backend.TRITON_V2 to Backend.TRITON

a4bc833

According to quantize_config.format, determine when the Backend is pa…

96b962d

…cking the model.

Auto choose the fastest one Backend based on quant model compatibility

bba21f6

Qubitium reviewed Jun 26, 2024

View reviewed changes

gptqmodel/utils/importer.py Outdated Show resolved Hide resolved

ZX-ModelCloud added 4 commits June 26, 2024 12:04

fix issue: Automatically select Backend, returns incorrect qlinear.

bda115c

cleanup

9596aeb

Merge branch 'main' into zx_consolidate_backend

d856a5f

cleanup

3a298a1

ZX-ModelCloud marked this pull request as ready for review June 27, 2024 05:11

Qubitium merged commit 5b724ac into main Jun 27, 2024
2 of 3 checks passed

Qubitium deleted the zx_consolidate_backend branch June 27, 2024 06:16

Qubitium changed the title ~~Consolidate Backend~~ [CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg Jun 27, 2024

DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024

revert checkpoint_format rename (ModelCloud#68)

7875209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg #68

[CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg #68

ZX-ModelCloud commented Jun 26, 2024 •

edited by Qubitium

Loading

[CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg #68

[CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg #68

Conversation

ZX-ModelCloud commented Jun 26, 2024 • edited by Qubitium Loading

ZX-ModelCloud commented Jun 26, 2024 •

edited by Qubitium

Loading