Provide pruned version for weaker hardware #27

CommanderTvis · 2023-01-08T12:42:07Z

It would be really useful to have a pruned version of the model (like Balaboba) to launch on weaker video card setups.

CommanderTvis · 2023-03-20T10:19:52Z

Also, quantization even to 4 bits may be possible, like it is successfully done for LLaMa. https://github.com/ggerganov/llama.cpp

blokhin · 2023-03-20T16:04:29Z

+1 also this distribution technique might be very much applicable here: https://petals.ml

Provide feedback