Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding gpu quantization workflows and apis #1

Closed
wants to merge 1 commit into from

Conversation

HDCharles
Copy link
Contributor

@HDCharles HDCharles commented Nov 7, 2023

Stack from ghstack (oldest at bottom):

Summary:
Apis and workflows used for quantization and pruning in the
segment-anything-fast and gpt-fast repos.

Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:
Apis and workflows used for quantization and pruning in the
segment-anything-fast and gpt-fast repos.

Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
HDCharles added a commit that referenced this pull request Nov 7, 2023
Summary:
Apis and workflows used for quantization and pruning in the
segment-anything-fast and gpt-fast repos.

Test Plan: python /home/cdhernandez/local/ao/ao/quantization/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 31191a786cb43d31f37b6d77121c8e4882ded037
Pull Request resolved: #1
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2023
@HDCharles HDCharles deleted the branch gh/HDCharles/1/base November 7, 2023 17:31
@HDCharles HDCharles closed this Nov 7, 2023
@HDCharles HDCharles closed this in a753e3f Nov 7, 2023
@HDCharles HDCharles deleted the gh/HDCharles/1/head branch November 7, 2023 17:31
atalman added a commit that referenced this pull request Jun 27, 2024
jcaip added a commit that referenced this pull request Sep 6, 2024
* feat: starting layout implementation

fix: namespace of common modules

chore: remove not needed test file

fix: op name being registered

chore: can compile the cuda kernel

fix: segmentation fault

chore: wip - paste test code just to check if everything passes

feat: wip - adding layout. unpack not working

fix: circular import

feat: wip - can almost revert

feat: can unpack. just needs cleanup

chore: improve layout code

chore: wip - mm needs work

feat: wip - something seems wrong

fix: e2e test

feat: wip - add group param

fix: unpack weights

feat: marlin is implemented and correct

chore: rebase

chore: remove old import

feat: use int4 instead of dequantizing

chore: remove unused fn

feat: add checks and validation

feat: add new kernel and refactor code (#1)

* feat: wip - adding new kernel

* feat: wip - continue working on the unpack

* feat: wip - working on unpacking

* feat: remove old op

* feat: more code changes

* chore: remove old code

* feat: more code

* chore: more code changes

* chore: more code changes

* feat: add more documentation

* fix: dataclass

* feat: add more docs

* feat: remove assert

chore: block 8 bits

chore: update comment

feat: refactor dispatch

chore: add validation on group size

chore: wip - working on fixing unpack

feat: add small readme with sources

feat: add checks

feat: tests pass & can execute llama2

* compile kind of working

* fix: batching and layout outputs correct results

* fix: torch.compile

* wip

* feat: wip

* chore: cleanup

* chore: review

* chore: review v2

* update benchmarks + README

---------

Co-authored-by: Jesse Cai <[email protected]>
jcaip pushed a commit that referenced this pull request Sep 6, 2024
fix: namespace of common modules

chore: remove not needed test file

fix: op name being registered

chore: can compile the cuda kernel

fix: segmentation fault

chore: wip - paste test code just to check if everything passes

feat: wip - adding layout. unpack not working

fix: circular import

feat: wip - can almost revert

feat: can unpack. just needs cleanup

chore: improve layout code

chore: wip - mm needs work

feat: wip - something seems wrong

fix: e2e test

feat: wip - add group param

fix: unpack weights

feat: marlin is implemented and correct

chore: rebase

chore: remove old import

feat: use int4 instead of dequantizing

chore: remove unused fn

feat: add checks and validation

feat: add new kernel and refactor code (#1)

* feat: wip - adding new kernel

* feat: wip - continue working on the unpack

* feat: wip - working on unpacking

* feat: remove old op

* feat: more code changes

* chore: remove old code

* feat: more code

* chore: more code changes

* chore: more code changes

* feat: add more documentation

* fix: dataclass

* feat: add more docs

* feat: remove assert

chore: block 8 bits

chore: update comment

feat: refactor dispatch

chore: add validation on group size

chore: wip - working on fixing unpack

feat: add small readme with sources

feat: add checks

feat: tests pass & can execute llama2
andrewor14 pushed a commit that referenced this pull request Sep 6, 2024
* feat: starting layout implementation

fix: namespace of common modules

chore: remove not needed test file

fix: op name being registered

chore: can compile the cuda kernel

fix: segmentation fault

chore: wip - paste test code just to check if everything passes

feat: wip - adding layout. unpack not working

fix: circular import

feat: wip - can almost revert

feat: can unpack. just needs cleanup

chore: improve layout code

chore: wip - mm needs work

feat: wip - something seems wrong

fix: e2e test

feat: wip - add group param

fix: unpack weights

feat: marlin is implemented and correct

chore: rebase

chore: remove old import

feat: use int4 instead of dequantizing

chore: remove unused fn

feat: add checks and validation

feat: add new kernel and refactor code (#1)

* feat: wip - adding new kernel

* feat: wip - continue working on the unpack

* feat: wip - working on unpacking

* feat: remove old op

* feat: more code changes

* chore: remove old code

* feat: more code

* chore: more code changes

* chore: more code changes

* feat: add more documentation

* fix: dataclass

* feat: add more docs

* feat: remove assert

chore: block 8 bits

chore: update comment

feat: refactor dispatch

chore: add validation on group size

chore: wip - working on fixing unpack

feat: add small readme with sources

feat: add checks

feat: tests pass & can execute llama2

* compile kind of working

* fix: batching and layout outputs correct results

* fix: torch.compile

* wip

* feat: wip

* chore: cleanup

* chore: review

* chore: review v2

* update benchmarks + README

---------

Co-authored-by: Jesse Cai <[email protected]>
jainapurva added a commit that referenced this pull request Sep 6, 2024
* Lint fixes;

* Ruff auto-format
msaroufim added a commit that referenced this pull request Sep 6, 2024
msaroufim added a commit that referenced this pull request Sep 6, 2024
Revert "Lint fixes #1 torchao/dtypes (#827)"

This reverts commit 144445a.

Co-authored-by: Mark Saroufim <[email protected]>
HDCharles pushed a commit that referenced this pull request Sep 9, 2024
* feat: starting layout implementation

fix: namespace of common modules

chore: remove not needed test file

fix: op name being registered

chore: can compile the cuda kernel

fix: segmentation fault

chore: wip - paste test code just to check if everything passes

feat: wip - adding layout. unpack not working

fix: circular import

feat: wip - can almost revert

feat: can unpack. just needs cleanup

chore: improve layout code

chore: wip - mm needs work

feat: wip - something seems wrong

fix: e2e test

feat: wip - add group param

fix: unpack weights

feat: marlin is implemented and correct

chore: rebase

chore: remove old import

feat: use int4 instead of dequantizing

chore: remove unused fn

feat: add checks and validation

feat: add new kernel and refactor code (#1)

* feat: wip - adding new kernel

* feat: wip - continue working on the unpack

* feat: wip - working on unpacking

* feat: remove old op

* feat: more code changes

* chore: remove old code

* feat: more code

* chore: more code changes

* chore: more code changes

* feat: add more documentation

* fix: dataclass

* feat: add more docs

* feat: remove assert

chore: block 8 bits

chore: update comment

feat: refactor dispatch

chore: add validation on group size

chore: wip - working on fixing unpack

feat: add small readme with sources

feat: add checks

feat: tests pass & can execute llama2

* compile kind of working

* fix: batching and layout outputs correct results

* fix: torch.compile

* wip

* feat: wip

* chore: cleanup

* chore: review

* chore: review v2

* update benchmarks + README

---------

Co-authored-by: Jesse Cai <[email protected]>
jainapurva pushed a commit that referenced this pull request Sep 9, 2024
* feat: starting layout implementation

fix: namespace of common modules

chore: remove not needed test file

fix: op name being registered

chore: can compile the cuda kernel

fix: segmentation fault

chore: wip - paste test code just to check if everything passes

feat: wip - adding layout. unpack not working

fix: circular import

feat: wip - can almost revert

feat: can unpack. just needs cleanup

chore: improve layout code

chore: wip - mm needs work

feat: wip - something seems wrong

fix: e2e test

feat: wip - add group param

fix: unpack weights

feat: marlin is implemented and correct

chore: rebase

chore: remove old import

feat: use int4 instead of dequantizing

chore: remove unused fn

feat: add checks and validation

feat: add new kernel and refactor code (#1)

* feat: wip - adding new kernel

* feat: wip - continue working on the unpack

* feat: wip - working on unpacking

* feat: remove old op

* feat: more code changes

* chore: remove old code

* feat: more code

* chore: more code changes

* chore: more code changes

* feat: add more documentation

* fix: dataclass

* feat: add more docs

* feat: remove assert

chore: block 8 bits

chore: update comment

feat: refactor dispatch

chore: add validation on group size

chore: wip - working on fixing unpack

feat: add small readme with sources

feat: add checks

feat: tests pass & can execute llama2

* compile kind of working

* fix: batching and layout outputs correct results

* fix: torch.compile

* wip

* feat: wip

* chore: cleanup

* chore: review

* chore: review v2

* update benchmarks + README

---------

Co-authored-by: Jesse Cai <[email protected]>
jainapurva added a commit that referenced this pull request Sep 9, 2024
* Lint fixes;

* Ruff auto-format
jainapurva pushed a commit that referenced this pull request Sep 9, 2024
Revert "Lint fixes #1 torchao/dtypes (#827)"

This reverts commit 144445a.

Co-authored-by: Mark Saroufim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants