Support for on-loading quantization #8

smspillaz · 2023-07-26T21:55:52Z

No description provided.

…d funcs)

This can be used to pre-configure how we would like the model to be loaded. Right now it supports a configuration for quantization. A design principle of GGMLModelConfig is that it can be used also when the config is NULL - this is why the config_get functions return a gboolean indicating whether there is any change from the default in that part of the configuration and also handle the case where the config object is NULL

… constructor

…e ModelDesc

…e ModelDescNode We do this after loading the hyperparameters and starting to set up the ModelDesc. The ModelDesc contains all the information about the types of the tensors and since we can convert them now on the fly during loading, this is the perfect place to edit once we know the desired quantization configuration.

Its better supported on older platforms

Quantization is imprecise, so we could get slightly different answers depending on the architecture.

… case

smspillaz added 16 commits July 26, 2023 22:44

ggml-tensor: Add ggml_tensor_get_data_type

b9c0622

ggml-context: Add ggml_context_new_tensor (generic version of 1d/2d/3…

04ebcc2

…d funcs)

ggml-model-desc: Add ggml_model_desc_map

4227298

ggml-tensor: Add ggml_tensor_new

1dfe88e

ggml-tensor: Fix doc comment

d824d11

ggml-tensor: Add assert about size of buffer.

e1347e8

ggml-model: Formatting adjustment

c4a5854

ggml-model: Compute context memory size from the model desc

09289f1

ggml-model: Convert data as-needed upon reading a model file

61da896

ggml-language-model: Add a GGMLModelConfig argument in language model…

3bdbc26

… constructor

ggml-quantize: Add helper functions for setting up quantization in th…

ab9facf

…e ModelDesc

ggml-gpt: Add a helper function to get the quantize regexes

783f581

testLoadGPT: Fix typo

e549447

tests/js: Add test for running with quantization

bba8014

smspillaz force-pushed the quantization-conversion-support branch from 6659075 to bba8014 Compare July 26, 2023 21:56

smspillaz added 6 commits July 27, 2023 03:40

ggml-quantize: Use g_ptr_array_new_full instead of new_null_terminated

a178d55

Its better supported on older platforms

ggml-quantize: Ignore NULL regexes when unreffing

2dd0d11

testLoadGPT2: Allow for variations depending on system

60ceabe

Quantization is imprecise, so we could get slightly different answers depending on the architecture.

llm-writer-app: Adjust to change in load_defined_from_istream_async API

bd91dad

ggml-language-model: Support setting quantization flags also on async…

9db4417

… case

llm-writer-app: Allow configuring the quantization level

102980b

smspillaz merged commit ea09786 into master Jul 27, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for on-loading quantization #8

Support for on-loading quantization #8

smspillaz commented Jul 26, 2023

Support for on-loading quantization #8

Support for on-loading quantization #8

Conversation

smspillaz commented Jul 26, 2023