Metal support for Swift #3078

kchro3 · 2023-09-08T05:41:01Z

I have a working demo of using Metal w/ a Swift Mac app with these changes. Hopefully, this is a welcome contribution!

llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /Users/jeffhara/.cache/lm-studio/models/TheBloke/MythoMax-L2-Kimiko-v2-13B-GGUF/mythomax-l2-kimiko-v2-13b.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
...B (+  400.00 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size  =  400.00 MB
ggml_metal_init: allocating
2023-09-07 22:35:08.973604-0700 TestTypeaheadAI[60657:1063549] Metal GPU Frame Capture Enabled
2023-09-07 22:35:08.973931-0700 TestTypeaheadAI[60657:1063549] Metal API Validation Enabled
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: loaded kernel_add                         0x600000fd0370 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                     0x600000fd0500 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                         0x600000fd0690 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                     0x600000fd0820 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                       0x600000fd44b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                        0x600000fd4640 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                        0x600000fd47d0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                        0x600000fee7b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                    0x600000fee940 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf               0x600000feead0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                0x600000feec60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0               0x600000feedf0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1               0x600000feef80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0               0x600000fd09b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K               0x600000fd0b40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K               0x600000fd0cd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K               0x600000fd0e60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K               0x600000fd0ff0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K               0x600000fd1180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                    0x600000fd1310 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                        0x600000fd14a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32             0x600000fd1630 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32            0x600000fd17c0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32            0x600000fd1950 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32            0x600000fd1ae0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32            0x600000fd1c70 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32            0x600000fd1e00 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32            0x600000fd1f90 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32            0x600000fd2120 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32            0x600000fd22b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32              0x600000fd2440 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32             0x600000fd25d0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32             0x600000fd2760 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32             0x600000fd28f0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32             0x600000fd2a80 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32             0x600000fd2c10 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32             0x600000fd2da0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32             0x600000fd2f30 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32             0x600000fd30c0 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_rope                        0x600000fd3250 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                   0x600000fd33e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                 0x600000fd3610 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                 0x600000fd37a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                 0x600000fd3930 | th_max = 1024 | th_width =   32
ggml_metal_init: recommendedMaxWorkingSetSize  = 10922.67 MB
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size =   91.47 MB
llama_new_context_with_model: max tensor size =   128.17 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7501.56 MB, ( 7502.12 / 10922.67)
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =     1.48 MB, ( 7503.61 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   402.00 MB, ( 7905.61 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    90.02 MB, ( 7995.62 / 10922.67)


 what is the capital of japan
The capital of Japan is Tokyo. [end of text]
0
ggml_metal_free: deallocating

kchro3 · 2023-09-08T05:43:01Z

ggml-metal.m

@@ -140,12 +140,22 @@ @implementation GGMLMetalClass

    ctx->d_queue = dispatch_queue_create("llama.cpp", DISPATCH_QUEUE_CONCURRENT);

-#if 0


Thought it would be ok to replace this since it looked unfinished.

pkrmf · 2023-09-08T13:30:58Z

Package.swift

            ],
            publicHeadersPath: "spm-headers",
            cSettings: [
                .unsafeFlags(["-Wno-shorten-64-to-32"]),
+                .unsafeFlags(["-fno-objc-arc"]),
+                .define("GGML_SWIFT"),
+                .define("GGML_USE_METAL"),


Isn't this change forcing everyone to use metal? Can this be a flag defined by the customer instead? For instance, I can't run llamacpp with metal ON with my old MacBook and an AMD card

@kchro3 Let's address this comment and we can merge

how does it look now? i don't have an old macbook, but i was able to build it if i switched the if/else condition

# in Package.swift #if arch(x86_64) // instead of arch(arm) || arch(arm64)

// demo that it's not using metal anymore llm_load_tensors: ggml ctx size = 0.12 MB llm_load_tensors: mem required = 7500.97 MB (+ 400.00 MB per state) ................................................................................................... llama_new_context_with_model: kv self size = 400.00 MB llama_new_context_with_model: compute buffer total size = 75.47 MB what is the capital of japanToken received in Swift: Token received in Swift: The Token received in Swift: capital

pkrmf · 2023-09-08T16:23:16Z

Package.swift

@@ -4,23 +4,27 @@ import PackageDescription

 let package = Package(
    name: "llama",
+    platforms: [.macOS(.v11)],


I haven't tried iOS but I do wonder if we are OK by limiting this package to macOS only. Can llamacpp run on iOS, tvOS or even watchOS @ggerganov?

It can run even on a refrigerator 😄

Jokes aside - I see no reason to limit this to just macOS

i'm not sure why, but the compiler was not happy if platforms was not included, although that could be a metal thing?

i don't have a way to test on watchOS or tvOS... i can set it to the minimum non-deprecated version and if someone tries to cross that bridge, we could update it?

do you mind sharing the compiler error?

Hey folks, I would appreciate help on this. I'm seeing in my build logs that it's compiling the .metal file even if I put it in resources.

For example, if I do:

#if arch(arm) || arch(arm64) let platforms: [SupportedPlatform]? = [ .macOS(.v11), .iOS(.v11), .watchOS(.v4), .tvOS(.v11) ] let exclude: [String] = [] let resources: [Resource]? = [ .copy("ggml-metal.metal"), .copy("README.md") // just to validate that files get copied ] let additionalSources: [String] = ["ggml-metal.m"] let additionalSettings: [CSetting] = [ .unsafeFlags(["-fno-objc-arc"]), .define("GGML_SWIFT"), .define("GGML_USE_METAL") ] #else

I still see the default.metallib in my resources:

ls /Users/.../Library/Developer/Xcode/DerivedData/.../Build/Products/Debug/....app/Contents/Resources/llama_llama.bundle/Contents/Resources/ README.md default.metallib

~~Could it be because the file is in the project root and the target path is "."?~~ tried copying it into a new directory & excluding the original, but it still compiled...

what if you exclude ggml-metal.metal like it is done in master right now, but then add it in the resources section. That should do it I think

https://developer.apple.com/documentation/packagedescription/target/exclude#discussion

I think it doesn't work because exclude takes precedence over resources. For example, I pushed a branch https://github.com/ggerganov/llama.cpp/pull/3091/files, and my build logs don't show the .metal file:

https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/FunctionsandLibraries.html

Just a sanity check, but this documentation is saying that .metal files get automatically compiled, and that seems to be the case from what I've tried. Is there a specific reason why we need to compile from source?

https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/FunctionsandLibraries.html

Just a sanity check, but this documentation is saying that .metal files get automatically compiled, and that seems to be the case from what I've tried.

Got it, it looks like a better way if we use it on Xcode.

kchro3 · 2023-09-09T05:32:55Z

cc: @ggerganov, i think that all comments are now addressed

jhen0409 · 2023-09-09T06:42:56Z

After I give a try in a new Xcode app project, figured out the reason of the ggml-metal.metal compile errors on iOS:

Required iOS / tvOS 14.0 to use int64_t in Metal (ref)
Required Metal >= v2.3 ~~(Define MTL_LANGUAGE_REVISION = Metal23) (ref: search simd_sum)~~ (Not necessary)

Also, Metal may not be available on watchOS.

Package.swift

Co-authored-by: Jhen-Jie Hong <[email protected]>

ggerganov

@jhen0409
Can't give this a test atm - if all works merge it

kchro3 commented Sep 8, 2023

View reviewed changes

kchro3 added 2 commits September 8, 2023 01:58

Metal support for Swift

89a96fd

update

ce92d75

kchro3 force-pushed the kchro3/llama-swift-metal-support branch from 615b02c to ce92d75 Compare September 8, 2023 08:58

pkrmf reviewed Sep 8, 2023

View reviewed changes

kchro3 added 2 commits September 8, 2023 09:26

add a toggle for arm/arm64

99f85e7

set minimum versions for all platforms

690c794

jhen0409 mentioned this pull request Sep 8, 2023

metal : support build for iOS/tvOS #3089

Merged

update to use newLibraryWithURL

24013b3

jhen0409 reviewed Sep 9, 2023

View reviewed changes

Package.swift Outdated Show resolved Hide resolved

jhen0409 reviewed Sep 9, 2023

View reviewed changes

Package.swift Show resolved Hide resolved

bump version

d58af4d

Co-authored-by: Jhen-Jie Hong <[email protected]>

ggerganov approved these changes Sep 9, 2023

View reviewed changes

jhen0409 merged commit 21ac3a1 into ggerganov:master Sep 9, 2023
5 checks passed

kchro3 mentioned this pull request Dec 5, 2023

Revert compiler checks for swift package #4332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal support for Swift #3078

Metal support for Swift #3078

kchro3 commented Sep 8, 2023

kchro3 Sep 8, 2023

pkrmf Sep 8, 2023 •

edited

Loading

ggerganov Sep 8, 2023

kchro3 Sep 8, 2023 •

edited

Loading

pkrmf Sep 8, 2023

ggerganov Sep 8, 2023

kchro3 Sep 8, 2023

kchro3 Sep 8, 2023

pkrmf Sep 8, 2023

kchro3 Sep 9, 2023 •

edited

Loading

pkrmf Sep 9, 2023

kchro3 Sep 9, 2023

kchro3 Sep 9, 2023

jhen0409 Sep 9, 2023

kchro3 commented Sep 9, 2023

jhen0409 commented Sep 9, 2023 •

edited

Loading

ggerganov left a comment

		@@ -140,12 +140,22 @@ @implementation GGMLMetalClass

		ctx->d_queue = dispatch_queue_create("llama.cpp", DISPATCH_QUEUE_CONCURRENT);

		#if 0

Metal support for Swift #3078

Metal support for Swift #3078

Conversation

kchro3 commented Sep 8, 2023

Choose a reason for hiding this comment

pkrmf Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kchro3 Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kchro3 Sep 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kchro3 commented Sep 9, 2023

jhen0409 commented Sep 9, 2023 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

pkrmf Sep 8, 2023 •

edited

Loading

kchro3 Sep 8, 2023 •

edited

Loading

kchro3 Sep 9, 2023 •

edited

Loading

jhen0409 commented Sep 9, 2023 •

edited

Loading