Safetensors and model sharding #47

casper-hansen · 2023-09-13T17:29:05Z

This PR introduces the following:

Resolves #36

Checks:

Quantize model to safetensors
Quantize model to torch weights
Quantize model to safetensors with shards
Quantize model to torch weights with shards
Load safetensors without reference to quant filename AND with reference to quant filename
Load torch weights without reference to quant filename AND with reference to quant filename
Load safetensors from local directory and remote huggingface repository
Load torch weights from local directory and remote huggingface repository

casper-hansen added 9 commits September 13, 2023 16:11

Implement saving sharded weights + safetensors

97d38e2

Add safetensors support

db8ba32

Fix ignore patterns

7ba3125

Add generation example for safetensors

bb455d7

Unify using variable safetensors

219ccb3

Correct example

affd190

Add notes to example

720a1fc

Default to empty string for model file

435b3b4

Rename safetensors generation example

4ecd859

casper-hansen merged commit 8788fe1 into main Sep 15, 2023

casper-hansen deleted the safetensors branch September 15, 2023 11:23

TheBloke mentioned this pull request Sep 17, 2023

AWQ: vLLM cannot load AWQ models in Safetensors format vllm-project/vllm#1071

Closed

Provide feedback