Skip to content

Specific int8 operators for conv and fully connected? #10151

Answered by FSSRepo
mikhilg10 asked this question in Q&A
Discussion options

You must be logged in to vote

Quantization is mostly used to reduce the size of the weights that need to be multiplied (matrix multiplication ggml_mul_mat); the rest of the operations are performed in fp16 or fp32, depending on the case.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@mikhilg10
Comment options

Answer selected by mikhilg10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants