Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Moorethreads MUSA support #6697

Closed
wants to merge 3 commits into from
Closed

Add Moorethreads MUSA support #6697

wants to merge 3 commits into from

Conversation

dixyes
Copy link

@dixyes dixyes commented Apr 16, 2024

MUSA is a CUDA-like SDK on moorethreads platform, like HIP/ROCm: https://developer.mthreads.com/musa/musa-sdk

Yet only supports makefile With a simple dirty cmake implemention

Use MUSA_ARCH=21 for S80

Original musa /usr/local/musa/include/internal/mublas-types.h will mess up gcc 12 c++ compiling, needs modifiy:

@@ -32,8 +32,8 @@
    Hence, only define __noinline__ when the code is being processed
    by a  MUSA compiler component.
 */   
-#define __noinline__ \
-        __attribute__((noinline))
+//#define __noinline__ \
+//        __attribute__((noinline))
 #endif /* __MUSACC__  || __MUSA_ARCH__ || __MUSA_LIBDEVICE__ */
         
 #define __forceinline__ \

Copy link
Contributor

github-actions bot commented Apr 16, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 423 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=11212.29ms p(95)=28275.6ms fails=, finish reason: stop=367 truncated=56
  • Prompt processing (pp): avg=125.18tk/s p(95)=555.78tk/s
  • Token generation (tg): avg=23.21tk/s p(95)=35.11tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=musa commit=ec3cc36dc835572d4e17cea727d831062da499bc

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 639.79, 639.79, 639.79, 639.79, 639.79, 489.6, 489.6, 489.6, 489.6, 489.6, 503.36, 503.36, 503.36, 503.36, 503.36, 534.42, 534.42, 534.42, 534.42, 534.42, 575.08, 575.08, 575.08, 575.08, 575.08, 583.95, 583.95, 583.95, 583.95, 583.95, 584.38, 584.38, 584.38, 584.38, 584.38, 586.33, 586.33, 586.33, 586.33, 586.33, 615.47, 615.47, 615.47, 615.47, 615.47, 615.18, 615.18, 615.18, 615.18, 615.18, 628.41, 628.41, 628.41, 628.41, 628.41, 629.65, 629.65, 629.65, 629.65, 629.65, 646.43, 646.43, 646.43, 646.43, 646.43, 654.95, 654.95, 654.95, 654.95, 654.95, 659.85, 659.85, 659.85, 659.85, 659.85, 668.46, 668.46, 668.46, 668.46, 668.46, 618.62, 618.62, 618.62, 618.62, 618.62, 596.29, 596.29, 596.29, 596.29, 596.29, 600.57, 600.57, 600.57, 600.57, 600.57, 600.99, 600.99, 600.99, 600.99, 600.99, 600.66, 600.66, 600.66, 600.66, 600.66, 604.06, 604.06, 604.06, 604.06, 604.06, 606.0, 606.0, 606.0, 606.0, 606.0, 606.1, 606.1, 606.1, 606.1, 606.1, 605.75, 605.75, 605.75, 605.75, 605.75, 610.9, 610.9, 610.9, 610.9, 610.9, 611.43, 611.43, 611.43, 611.43, 611.43, 610.52, 610.52, 610.52, 610.52, 610.52, 613.81, 613.81, 613.81, 613.81, 613.81, 612.87, 612.87, 612.87, 612.87, 612.87, 615.74, 615.74, 615.74, 615.74, 615.74, 617.77, 617.77, 617.77, 617.77, 617.77, 626.94, 626.94, 626.94, 626.94, 626.94, 625.97, 625.97, 625.97, 625.97, 625.97, 622.99, 622.99, 622.99, 622.99, 622.99, 623.74, 623.74, 623.74, 623.74, 623.74, 627.44, 627.44, 627.44, 627.44, 627.44, 629.51, 629.51, 629.51, 629.51, 629.51, 629.15, 629.15, 629.15, 629.15, 629.15, 630.17, 630.17, 630.17, 630.17, 630.17, 632.82, 632.82, 632.82, 632.82, 632.82, 640.21, 640.21, 640.21, 640.21, 640.21, 646.72, 646.72, 646.72, 646.72, 646.72, 648.14, 648.14, 648.14, 648.14, 648.14, 618.95, 618.95, 618.95, 618.95, 618.95, 619.17, 619.17, 619.17, 619.17, 619.17, 619.71, 619.71, 619.71, 619.71, 619.71, 621.49, 621.49, 621.49, 621.49, 621.49, 624.89, 624.89, 624.89, 624.89, 624.89, 632.82, 632.82, 632.82, 632.82, 632.82, 604.21, 604.21, 604.21, 604.21, 604.21, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.44, 603.44, 603.44, 603.44, 603.44, 601.65, 601.65, 601.65, 601.65, 601.65, 598.88, 598.88, 598.88, 598.88, 598.88, 598.64, 598.64, 598.64, 598.64, 598.64, 600.76, 600.76, 600.76, 600.76, 600.76, 606.13, 606.13, 606.13, 606.13, 606.13, 606.12, 606.12, 606.12, 606.12]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.66, 33.66, 33.66, 33.66, 33.66, 31.0, 31.0, 31.0, 31.0, 31.0, 22.8, 22.8, 22.8, 22.8, 22.8, 23.56, 23.56, 23.56, 23.56, 23.56, 23.94, 23.94, 23.94, 23.94, 23.94, 23.68, 23.68, 23.68, 23.68, 23.68, 23.99, 23.99, 23.99, 23.99, 23.99, 24.36, 24.36, 24.36, 24.36, 24.36, 25.0, 25.0, 25.0, 25.0, 25.0, 25.12, 25.12, 25.12, 25.12, 25.12, 25.22, 25.22, 25.22, 25.22, 25.22, 25.16, 25.16, 25.16, 25.16, 25.16, 24.64, 24.64, 24.64, 24.64, 24.64, 24.5, 24.5, 24.5, 24.5, 24.5, 23.75, 23.75, 23.75, 23.75, 23.75, 23.68, 23.68, 23.68, 23.68, 23.68, 23.52, 23.52, 23.52, 23.52, 23.52, 22.83, 22.83, 22.83, 22.83, 22.83, 22.95, 22.95, 22.95, 22.95, 22.95, 23.03, 23.03, 23.03, 23.03, 23.03, 23.06, 23.06, 23.06, 23.06, 23.06, 23.01, 23.01, 23.01, 23.01, 23.01, 22.77, 22.77, 22.77, 22.77, 22.77, 22.58, 22.58, 22.58, 22.58, 22.58, 22.26, 22.26, 22.26, 22.26, 22.26, 22.07, 22.07, 22.07, 22.07, 22.07, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 22.05, 22.05, 22.05, 22.05, 22.05, 22.09, 22.09, 22.09, 22.09, 22.09, 22.17, 22.17, 22.17, 22.17, 22.17, 22.28, 22.28, 22.28, 22.28, 22.28, 22.31, 22.31, 22.31, 22.31, 22.31, 22.02, 22.02, 22.02, 22.02, 22.02, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.19, 22.19, 22.19, 22.19, 22.19, 22.26, 22.26, 22.26, 22.26, 22.26, 22.33, 22.33, 22.33, 22.33, 22.33, 22.39, 22.39, 22.39, 22.39, 22.39, 22.4, 22.4, 22.4, 22.4, 22.4, 22.42, 22.42, 22.42, 22.42, 22.42, 22.39, 22.39, 22.39, 22.39, 22.39, 22.35, 22.35, 22.35, 22.35, 22.35, 22.33, 22.33, 22.33, 22.33, 22.33, 22.23, 22.23, 22.23, 22.23, 22.23, 22.33, 22.33, 22.33, 22.33, 22.33, 22.41, 22.41, 22.41, 22.41, 22.41, 22.53, 22.53, 22.53, 22.53, 22.53, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.4, 22.4, 22.4, 22.4, 22.4, 22.33, 22.33, 22.33, 22.33, 22.33, 20.72, 20.72, 20.72, 20.72, 20.72, 20.53, 20.53, 20.53, 20.53, 20.53, 20.54, 20.54, 20.54, 20.54, 20.54, 20.56, 20.56, 20.56, 20.56, 20.56, 20.58, 20.58, 20.58, 20.58]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.25, 0.25, 0.25, 0.25, 0.29, 0.29, 0.29, 0.29, 0.29, 0.17, 0.17, 0.17, 0.17, 0.17, 0.24, 0.24, 0.24, 0.24, 0.24, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.27, 0.27, 0.27, 0.27, 0.27, 0.18, 0.18, 0.18, 0.18, 0.18, 0.35, 0.35, 0.35, 0.35, 0.35, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.23, 0.23, 0.23, 0.23, 0.23, 0.29, 0.29, 0.29, 0.29, 0.29, 0.28, 0.28, 0.28, 0.28, 0.28, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.11, 0.11, 0.11, 0.11, 0.11, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.24, 0.24, 0.24, 0.24, 0.24, 0.46, 0.46, 0.46, 0.46, 0.46, 0.61, 0.61, 0.61, 0.61, 0.61, 0.65, 0.65, 0.65, 0.65, 0.65, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.39, 0.39, 0.39, 0.39, 0.39, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0]
                    
Loading

@dixyes dixyes marked this pull request as ready for review April 18, 2024 02:31
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level enhancement New feature or request labels May 9, 2024
@dixyes dixyes closed this Sep 3, 2024
@dixyes
Copy link
Author

dixyes commented Sep 3, 2024

link: #8383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants