Adding gpu quantization workflows and apis #1

HDCharles · 2023-11-07T08:10:57Z

Stack from ghstack (oldest at bottom):

-> Adding gpu quantization workflows and apis #1

Summary:
Apis and workflows used for quantization and pruning in the
segment-anything-fast and gpt-fast repos.

Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Apis and workflows used for quantization and pruning in the segment-anything-fast and gpt-fast repos. Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Apis and workflows used for quantization and pruning in the segment-anything-fast and gpt-fast repos. Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 31191a786cb43d31f37b6d77121c8e4882ded037 Pull Request resolved: #1

* feat: starting layout implementation fix: namespace of common modules chore: remove not needed test file fix: op name being registered chore: can compile the cuda kernel fix: segmentation fault chore: wip - paste test code just to check if everything passes feat: wip - adding layout. unpack not working fix: circular import feat: wip - can almost revert feat: can unpack. just needs cleanup chore: improve layout code chore: wip - mm needs work feat: wip - something seems wrong fix: e2e test feat: wip - add group param fix: unpack weights feat: marlin is implemented and correct chore: rebase chore: remove old import feat: use int4 instead of dequantizing chore: remove unused fn feat: add checks and validation feat: add new kernel and refactor code (#1) * feat: wip - adding new kernel * feat: wip - continue working on the unpack * feat: wip - working on unpacking * feat: remove old op * feat: more code changes * chore: remove old code * feat: more code * chore: more code changes * chore: more code changes * feat: add more documentation * fix: dataclass * feat: add more docs * feat: remove assert chore: block 8 bits chore: update comment feat: refactor dispatch chore: add validation on group size chore: wip - working on fixing unpack feat: add small readme with sources feat: add checks feat: tests pass & can execute llama2 * compile kind of working * fix: batching and layout outputs correct results * fix: torch.compile * wip * feat: wip * chore: cleanup * chore: review * chore: review v2 * update benchmarks + README --------- Co-authored-by: Jesse Cai <[email protected]>

fix: namespace of common modules chore: remove not needed test file fix: op name being registered chore: can compile the cuda kernel fix: segmentation fault chore: wip - paste test code just to check if everything passes feat: wip - adding layout. unpack not working fix: circular import feat: wip - can almost revert feat: can unpack. just needs cleanup chore: improve layout code chore: wip - mm needs work feat: wip - something seems wrong fix: e2e test feat: wip - add group param fix: unpack weights feat: marlin is implemented and correct chore: rebase chore: remove old import feat: use int4 instead of dequantizing chore: remove unused fn feat: add checks and validation feat: add new kernel and refactor code (#1) * feat: wip - adding new kernel * feat: wip - continue working on the unpack * feat: wip - working on unpacking * feat: remove old op * feat: more code changes * chore: remove old code * feat: more code * chore: more code changes * chore: more code changes * feat: add more documentation * fix: dataclass * feat: add more docs * feat: remove assert chore: block 8 bits chore: update comment feat: refactor dispatch chore: add validation on group size chore: wip - working on fixing unpack feat: add small readme with sources feat: add checks feat: tests pass & can execute llama2

* feat: starting layout implementation fix: namespace of common modules chore: remove not needed test file fix: op name being registered chore: can compile the cuda kernel fix: segmentation fault chore: wip - paste test code just to check if everything passes feat: wip - adding layout. unpack not working fix: circular import feat: wip - can almost revert feat: can unpack. just needs cleanup chore: improve layout code chore: wip - mm needs work feat: wip - something seems wrong fix: e2e test feat: wip - add group param fix: unpack weights feat: marlin is implemented and correct chore: rebase chore: remove old import feat: use int4 instead of dequantizing chore: remove unused fn feat: add checks and validation feat: add new kernel and refactor code (#1) * feat: wip - adding new kernel * feat: wip - continue working on the unpack * feat: wip - working on unpacking * feat: remove old op * feat: more code changes * chore: remove old code * feat: more code * chore: more code changes * chore: more code changes * feat: add more documentation * fix: dataclass * feat: add more docs * feat: remove assert chore: block 8 bits chore: update comment feat: refactor dispatch chore: add validation on group size chore: wip - working on fixing unpack feat: add small readme with sources feat: add checks feat: tests pass & can execute llama2 * compile kind of working * fix: batching and layout outputs correct results * fix: torch.compile * wip * feat: wip * chore: cleanup * chore: review * chore: review v2 * update benchmarks + README --------- Co-authored-by: Jesse Cai <[email protected]>

* Lint fixes; * Ruff auto-format

This reverts commit 144445a.

Revert "Lint fixes #1 torchao/dtypes (#827)" This reverts commit 144445a. Co-authored-by: Mark Saroufim <[email protected]>

* feat: starting layout implementation fix: namespace of common modules chore: remove not needed test file fix: op name being registered chore: can compile the cuda kernel fix: segmentation fault chore: wip - paste test code just to check if everything passes feat: wip - adding layout. unpack not working fix: circular import feat: wip - can almost revert feat: can unpack. just needs cleanup chore: improve layout code chore: wip - mm needs work feat: wip - something seems wrong fix: e2e test feat: wip - add group param fix: unpack weights feat: marlin is implemented and correct chore: rebase chore: remove old import feat: use int4 instead of dequantizing chore: remove unused fn feat: add checks and validation feat: add new kernel and refactor code (#1) * feat: wip - adding new kernel * feat: wip - continue working on the unpack * feat: wip - working on unpacking * feat: remove old op * feat: more code changes * chore: remove old code * feat: more code * chore: more code changes * chore: more code changes * feat: add more documentation * fix: dataclass * feat: add more docs * feat: remove assert chore: block 8 bits chore: update comment feat: refactor dispatch chore: add validation on group size chore: wip - working on fixing unpack feat: add small readme with sources feat: add checks feat: tests pass & can execute llama2 * compile kind of working * fix: batching and layout outputs correct results * fix: torch.compile * wip * feat: wip * chore: cleanup * chore: review * chore: review v2 * update benchmarks + README --------- Co-authored-by: Jesse Cai <[email protected]>

* Lint fixes; * Ruff auto-format

Revert "Lint fixes #1 torchao/dtypes (#827)" This reverts commit 144445a. Co-authored-by: Mark Saroufim <[email protected]>

Adding gpu quantization workflows and apis

222fd7d

Summary: Apis and workflows used for quantization and pruning in the segment-anything-fast and gpt-fast repos. Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2023

HDCharles requested a review from cpuhrsch November 7, 2023 08:11

HDCharles deleted the branch gh/HDCharles/1/base November 7, 2023 17:31

HDCharles closed this Nov 7, 2023

HDCharles closed this in a753e3f Nov 7, 2023

HDCharles deleted the gh/HDCharles/1/head branch November 7, 2023 17:31

jerryzh168 mentioned this pull request Jun 15, 2024

Add WOQ int8 test with Inductor Freeze #362

Merged

atalman added a commit that referenced this pull request Jun 27, 2024

Add validations to torchao #1 (#452)

3ae2235

jainapurva added a commit that referenced this pull request Sep 6, 2024

Lint fixes #1 torchao/dtypes (#827)

144445a

* Lint fixes; * Ruff auto-format

msaroufim added a commit that referenced this pull request Sep 6, 2024

Revert "Lint fixes #1 torchao/dtypes (#827)"

451a91b

This reverts commit 144445a.

msaroufim added a commit that referenced this pull request Sep 6, 2024

Revert "Lint fixes #1 torchao/dtypes" (#836)

1ce7da9

Revert "Lint fixes #1 torchao/dtypes (#827)" This reverts commit 144445a. Co-authored-by: Mark Saroufim <[email protected]>

jainapurva added a commit that referenced this pull request Sep 9, 2024

Lint fixes #1 torchao/dtypes (#827)

b2b2256

* Lint fixes; * Ruff auto-format

jainapurva pushed a commit that referenced this pull request Sep 9, 2024

Revert "Lint fixes #1 torchao/dtypes" (#836)

a1d86bd

Revert "Lint fixes #1 torchao/dtypes (#827)" This reverts commit 144445a. Co-authored-by: Mark Saroufim <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding gpu quantization workflows and apis #1

Adding gpu quantization workflows and apis #1

HDCharles commented Nov 7, 2023 •

edited

Loading

Adding gpu quantization workflows and apis #1

Adding gpu quantization workflows and apis #1

Conversation

HDCharles commented Nov 7, 2023 • edited Loading

HDCharles commented Nov 7, 2023 •

edited

Loading