Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal support for Swift #3078

Merged
merged 6 commits into from
Sep 9, 2023

Conversation

kchro3
Copy link
Contributor

@kchro3 kchro3 commented Sep 8, 2023

I have a working demo of using Metal w/ a Swift Mac app with these changes. Hopefully, this is a welcome contribution!

llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /Users/jeffhara/.cache/lm-studio/models/TheBloke/MythoMax-L2-Kimiko-v2-13B-GGUF/mythomax-l2-kimiko-v2-13b.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
...B (+  400.00 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size  =  400.00 MB
ggml_metal_init: allocating
2023-09-07 22:35:08.973604-0700 TestTypeaheadAI[60657:1063549] Metal GPU Frame Capture Enabled
2023-09-07 22:35:08.973931-0700 TestTypeaheadAI[60657:1063549] Metal API Validation Enabled
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: loaded kernel_add                         0x600000fd0370 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                     0x600000fd0500 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                         0x600000fd0690 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                     0x600000fd0820 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                       0x600000fd44b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                        0x600000fd4640 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                        0x600000fd47d0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                        0x600000fee7b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                    0x600000fee940 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf               0x600000feead0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                0x600000feec60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0               0x600000feedf0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1               0x600000feef80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0               0x600000fd09b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K               0x600000fd0b40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K               0x600000fd0cd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K               0x600000fd0e60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K               0x600000fd0ff0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K               0x600000fd1180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                    0x600000fd1310 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                        0x600000fd14a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32             0x600000fd1630 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32            0x600000fd17c0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32            0x600000fd1950 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32            0x600000fd1ae0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32            0x600000fd1c70 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32            0x600000fd1e00 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32            0x600000fd1f90 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32            0x600000fd2120 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32            0x600000fd22b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32              0x600000fd2440 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32             0x600000fd25d0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32             0x600000fd2760 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32             0x600000fd28f0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32             0x600000fd2a80 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32             0x600000fd2c10 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32             0x600000fd2da0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32             0x600000fd2f30 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32             0x600000fd30c0 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_rope                        0x600000fd3250 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                   0x600000fd33e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                 0x600000fd3610 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                 0x600000fd37a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                 0x600000fd3930 | th_max = 1024 | th_width =   32
ggml_metal_init: recommendedMaxWorkingSetSize  = 10922.67 MB
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size =   91.47 MB
llama_new_context_with_model: max tensor size =   128.17 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7501.56 MB, ( 7502.12 / 10922.67)
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =     1.48 MB, ( 7503.61 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   402.00 MB, ( 7905.61 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    90.02 MB, ( 7995.62 / 10922.67)


 what is the capital of japan
The capital of Japan is Tokyo. [end of text]
0
ggml_metal_free: deallocating

@@ -140,12 +140,22 @@ @implementation GGMLMetalClass

ctx->d_queue = dispatch_queue_create("llama.cpp", DISPATCH_QUEUE_CONCURRENT);

#if 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought it would be ok to replace this since it looked unfinished.

@kchro3 kchro3 force-pushed the kchro3/llama-swift-metal-support branch from 615b02c to ce92d75 Compare September 8, 2023 08:58
Package.swift Outdated
],
publicHeadersPath: "spm-headers",
cSettings: [
.unsafeFlags(["-Wno-shorten-64-to-32"]),
.unsafeFlags(["-fno-objc-arc"]),
.define("GGML_SWIFT"),
.define("GGML_USE_METAL"),
Copy link

@pkrmf pkrmf Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this change forcing everyone to use metal? Can this be a flag defined by the customer instead? For instance, I can't run llamacpp with metal ON with my old MacBook and an AMD card

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kchro3 Let's address this comment and we can merge

Copy link
Contributor Author

@kchro3 kchro3 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it look now? i don't have an old macbook, but i was able to build it if i switched the if/else condition

# in Package.swift
#if arch(x86_64)  // instead of arch(arm) || arch(arm64)
// demo that it's not using metal anymore
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: mem required  = 7500.97 MB (+  400.00 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size  =  400.00 MB
llama_new_context_with_model: compute buffer total size =   75.47 MB


 what is the capital of japanToken received in Swift: 

Token received in Swift: The
Token received in Swift:  capital

Package.swift Outdated
@@ -4,23 +4,27 @@ import PackageDescription

let package = Package(
name: "llama",
platforms: [.macOS(.v11)],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried iOS but I do wonder if we are OK by limiting this package to macOS only. Can llamacpp run on iOS, tvOS or even watchOS @ggerganov?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can run even on a refrigerator 😄

Jokes aside - I see no reason to limit this to just macOS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure why, but the compiler was not happy if platforms was not included, although that could be a metal thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't have a way to test on watchOS or tvOS... i can set it to the minimum non-deprecated version and if someone tries to cross that bridge, we could update it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind sharing the compiler error?

Copy link
Contributor Author

@kchro3 kchro3 Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey folks, I would appreciate help on this. I'm seeing in my build logs that it's compiling the .metal file even if I put it in resources.

For example, if I do:

#if arch(arm) || arch(arm64)
let platforms: [SupportedPlatform]? = [
    .macOS(.v11),
    .iOS(.v11),
    .watchOS(.v4),
    .tvOS(.v11)
]
let exclude: [String] = []
let resources: [Resource]? = [
    .copy("ggml-metal.metal"),
    .copy("README.md")  // just to validate that files get copied
]
let additionalSources: [String] = ["ggml-metal.m"]
let additionalSettings: [CSetting] = [
    .unsafeFlags(["-fno-objc-arc"]),
    .define("GGML_SWIFT"),
    .define("GGML_USE_METAL")
]
#else

I still see the default.metallib in my resources:

ls /Users/.../Library/Developer/Xcode/DerivedData/.../Build/Products/Debug/....app/Contents/Resources/llama_llama.bundle/Contents/Resources/
README.md		default.metallib
Screenshot 2023-09-08 at 6 37 22 PM

Could it be because the file is in the project root and the target path is "."? tried copying it into a new directory & excluding the original, but it still compiled...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you exclude ggml-metal.metal like it is done in master right now, but then add it in the resources section. That should do it I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.apple.com/documentation/packagedescription/target/exclude#discussion

I think it doesn't work because exclude takes precedence over resources. For example, I pushed a branch https://github.com/ggerganov/llama.cpp/pull/3091/files, and my build logs don't show the .metal file:

Screenshot 2023-09-08 at 7 57 54 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/FunctionsandLibraries.html

Just a sanity check, but this documentation is saying that .metal files get automatically compiled, and that seems to be the case from what I've tried. Is there a specific reason why we need to compile from source?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/FunctionsandLibraries.html

Just a sanity check, but this documentation is saying that .metal files get automatically compiled, and that seems to be the case from what I've tried.

Got it, it looks like a better way if we use it on Xcode.

@kchro3
Copy link
Contributor Author

kchro3 commented Sep 9, 2023

cc: @ggerganov, i think that all comments are now addressed

@jhen0409
Copy link
Collaborator

jhen0409 commented Sep 9, 2023

After I give a try in a new Xcode app project, figured out the reason of the ggml-metal.metal compile errors on iOS:

  • Required iOS / tvOS 14.0 to use int64_t in Metal (ref)
  • Required Metal >= v2.3 (Define MTL_LANGUAGE_REVISION = Metal23) (ref: search simd_sum) (Not necessary)

Also, Metal may not be available on watchOS.

Package.swift Outdated Show resolved Hide resolved
Package.swift Show resolved Hide resolved
Co-authored-by: Jhen-Jie Hong <[email protected]>
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhen0409
Can't give this a test atm - if all works merge it

@jhen0409 jhen0409 merged commit 21ac3a1 into ggerganov:master Sep 9, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants