Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the difference of this compiler and HexagonSDK's? #26

Open
zchrissirhcz opened this issue Apr 7, 2024 · 6 comments
Open

What's the difference of this compiler and HexagonSDK's? #26

zchrissirhcz opened this issue Apr 7, 2024 · 6 comments

Comments

@zchrissirhcz
Copy link

Hi, QUIC toolchain maintainers:

I installed Hexagon SDK 5.5.0.1 (via QPM) which contains hexagon-clang 8.7.06:

zz@localhost:~/soft/Qualcomm/Hexagon_SDK/5.5.0.1$ ./tools/HEXAGON_Tools/8.7.06/Tools/bin/hexagon-clang --version
QuIC LLVM Hexagon Clang version 8.7.06
Target: hexagon
Thread model: posix
InstalledDir: /home/zz/soft/Qualcomm/Hexagon_SDK/5.5.0.1/./tools/HEXAGON_Tools/8.7.06/Tools/bin

I would like to analyze some performance issue of C/C++ code and it's disassembly. I notice there is a hexagon-clang compiler in Compiler Explorer (https://godbolt.org/):

image

What I am confused about is, are they the same or similar compiler?

@androm3da
Copy link
Contributor

What I am confused about is, are they the same or similar compiler?

The hexagon compiler in the Compiler Explorer is the same one as produced by the scripts in this repo. But it's different from the one in the Hexagon SDK. It's different in several ways, there's different passes provided by the compiler in the SDK, for example. But it also might be using a different baseline LLVM/Clang version.

For example, 8.7.06 is based on llvm+clang 15.0.0:

$ readlink /local/mnt/workspace/Qualcomm/Hexagon_SDK/5.4.1.1/tools/HEXAGON_Tools/8.7.06/Tools/bin/hexagon-clang
clang-15

But ultimately they're similar in that they both produce executable code for Hexagon DSPs.

@zchrissirhcz
Copy link
Author

@androm3da Thank you for the reply.

But ultimately they're similar in that they both produce executable code for Hexagon DSPs.

OK, so I can use Compiler Explorer for generate purpose assembly analysis, such as counting how many intruction packets as estimation of the program, is that correct?

And I also wonder if they use same stack size? I find it about 14000 bytes in a unittest program of v66 cDSP, from the HexagonSDK 5.5.0.1's, which is far less than x86-64 Linux (~8192 KB). This repo's hexagon-clang use musl libc, and it seems mucl lib use a smaller stack size.

@androm3da
Copy link
Contributor

OK, so I can use Compiler Explorer for generate purpose assembly analysis, such as counting how many intruction packets as estimation of the program, is that correct?

You should expect the codegen performance of these two compilers to be different - at least with the current releases of each. This would mean that if you want to count the number of packets emitted for a given C program, you should expect differences in this count between the two.

And I also wonder if they use same stack size? I find it about 14000 bytes in a unittest program of v66 cDSP, from the HexagonSDK 5.5.0.1's, which is far less than x86-64 Linux (~8192 KB). This repo's hexagon-clang use musl libc, and it seems mucl lib use a smaller stack size.

It's important to note -- there are two targets usable with the toolchain built in this repo: the baremetal one hexagon-unknown-none-elf and the Linux one hexagon-unknown-linux-musl. The baremetal one has the correct ABI for code that would run on QuRT OS. The linux one cannot be used for programs that would run on QuRT.

Your question regarding stack size - are you asking about the typical size of an individual frame, or the size of the entire stack allocation? Linux programs would grow their stack dynamically. I don't recall the stack allocation size / behavior for QuRT but I might be able to look up this information. Deciding when to use the stack and how much of the stack to use - that is an aspect of the compiler's codegen performance and that would differ among the Hexagon SDK and this toolchain's compiler.

@zchrissirhcz
Copy link
Author

You should expect the codegen performance of these two compilers to be different - at least with the current releases of each. This would mean that if you want to count the number of packets emitted for a given C program, you should expect differences in this count between the two.

OK, the two compilers are different and use one compiler for {src1.cpp, src2.cpp} comparison or { compile option1, compile option2} comparison is the usual way to use.

Your question regarding stack size - are you asking about the typical size of an individual frame, or the size of the entire stack allocation?

Yes, I am asking the size of the entire stack allocation. The frame chain is like: main() -> gemm() -> gemm_internal(), for matrix-matrix multiplication:

// cv::AutoBuffer is part of OpenCV
// whole class: https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core/utility.hpp#L71-L151
// default fixed_size : https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core/utility.hpp#L100
// stack allocated buffer: https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core/utility.hpp#L150
/*
template<typename _Tp, size_t fixed_size = 1024/sizeof(_Tp)+8> class AutoBuffer
{
public:
    ...
    _Tp buf[(fixed_size > 0) ? fixed_size : 1];
};
*/

void gemm()
{
    cv::AutoBuffer buf1;
    ...
    gemm_internal();
    ...
}

void gemm_internal()
{
    cv::AutoBuffer buf2;
    ...
}

int main()
{
    float a[200*200];
    float b[200];
    randomize(a, b);
    float c[200];
    gemm(a, b, c, 200, 200, 200, 1);
}

As illustrated, both gemm() and gemm_internal() use a cv::AutoBuffer instance, which will consume stack memory. When the maximum allowed entire stack size is small, the pasted code may easily reach the limit, cause segmentation fault when running. And in Linux x86-64, the allowed entire stack size is large, the mentioned segmentation nearly won't happen.

@zchrissirhcz
Copy link
Author

There is also a difference for integer types between the two compilers. I use the following snippet for compile-time testing, and got different output:

#include <stdint.h>
#include <stdio.h>
#include <type_traits>

template<typename T> static inline T saturate_cast(uint32_t v)  { return T(v); }
template<typename T> static inline T saturate_cast(int32_t v)   { return T(v); }

int main()
{
    //int a = 233;
    //saturate_cast<uint8_t>(a);

    static_assert(std::is_same<int, int32_t>::value, "int is not int32_t");
    static_assert(std::is_same<int, long>::value, "int is not long");
    static_assert(!std::is_same<int32_t, long>::value, "int32_t is same as long");

    return 0;
}

Output from HexagonSDK 5.5.0.1's hexagon-clang:

<source>:13:5: error: static assertion failed due to requirement 'std::is_same<int, long>::value': int is not int32_t
    static_assert(std::is_same<int, int32_t>::value, "int is not int32_t");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:14:5: error: static assertion failed due to requirement 'std::is_same<int, long>::value': int is not long
    static_assert(std::is_same<int, long>::value, "int is not long");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:15:5: error: static assertion failed due to requirement '!std::is_same<long, long>::value': int32_t is same as long
    static_assert(!std::is_same<int32_t, long>::value, "int32_t is same as long");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 errors generated.
Compiler returned: 1

Output from Compiler Explorer's hexagon-clang 16.0.5: (https://godbolt.org/z/v3nYrqnhq)

<source>:14:5: error: static assertion failed due to requirement 'std::is_same<int, long>::value': int is not long
    static_assert(std::is_same<int, long>::value, "int is not long");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Compiler returned: 1

@androm3da
Copy link
Contributor

When the maximum allowed entire stack size is small, the pasted code may easily reach the limit, cause segmentation fault when running.

Okay, I see -- so you're trying to do some static analysis of the maximum stack depth? To compare with the OS limitation(s) on stack size?

There is also a difference for integer types between the two compilers

Incidentally I had looked into this recently. Some differences between the Hexagon SDK compiler and this open source toolchain are expected. But this one may not be - I'll do a bit of digging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants