-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : fix arch check in bf16_to_fp32 #10164
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ggerganov
approved these changes
Nov 4, 2024
github-actions
bot
added
the
ggml
changes relating to the ggml tensor library for machine learning
label
Nov 4, 2024
apicalshark
added a commit
to apicalshark/llama.cpp
that referenced
this pull request
Nov 7, 2024
* metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <[email protected]> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Plamen Minev <[email protected]> Co-authored-by: Yuri Khrustalev <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: snadampal <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Eve <[email protected]> Co-authored-by: Gabe Goodhart <[email protected]>
apicalshark
added a commit
to apicalshark/llama.cpp
that referenced
this pull request
Nov 8, 2024
* Merge PR (#10) (#11) (#13) Merge --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dennyxbox890 <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests). Updates `requests` from 2.31.0 to 2.32.2 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.31.0...v2.32.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <[email protected]> * Temp (#15) * metal : fix minor string leaks (ggml/1004) * cmake : make it possible linking ggml as external lib (ggml/1003) * sync : ggml * CANN: adjust backend registry refactor. (ggerganov#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. * metal : move dequantize templates to beginning of MSL source (#0) * metal : simplify f16 and f32 dequant kernels (#0) * cuda : clear error after changing peer access (ggerganov#10153) * fix build break on arm64 linux (ggerganov#10166) This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144 * server : clarify /slots endpoint, add is_processing (ggerganov#10162) * server : clarify /slots endpoint, add is_processing * fix tests * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggerganov#10167) * ggml : fix gelu tables initialization (ggerganov#10172) * Q6_K AVX improvements (ggerganov#10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 * ggml : fix arch check in bf16_to_fp32 (ggerganov#10164) * llama : add <|tool_call|> formatting to Granite template (ggerganov#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <[email protected]> * metal : add quantized FA support (ggerganov#10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci] * ggml : adjust is_first_call init value (ggerganov#10193) ggml-ci * metal : fix from ptr buffer name (ggerganov#10189) * server : remove hack for extra parallel slot (ggerganov#10187) ggml-ci * metal : add BF16 support (ggerganov#8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support --------- Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Plamen Minev <[email protected]> Co-authored-by: Yuri Khrustalev <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: snadampal <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Eve <[email protected]> Co-authored-by: Gabe Goodhart <[email protected]> --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: dennyxbox890 <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Plamen Minev <[email protected]> Co-authored-by: Yuri Khrustalev <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: snadampal <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Eve <[email protected]> Co-authored-by: Gabe Goodhart <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #10154