Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang-cl retrieves TLS base address twice when __tls_guard has been set to true #113010

Open
mcfi opened this issue Oct 19, 2024 · 0 comments
Open
Labels
clang-cl `clang-cl` driver. Don't use for other compiler parts

Comments

@mcfi
Copy link

mcfi commented Oct 19, 2024

Consider this example https://godbolt.org/z/nGK9cn6TW

struct A {
    int a;
};

extern __declspec(thread) struct A *a;

struct A*
getA(void)
{
    return a;
}

clang-cl 19.1.0 generates the following when targeting x64

getA:                                   # @getA
	sub	rsp, 40
	.seh_stackalloc 40
	.seh_endprologue
	mov	eax, dword ptr [rip + _tls_index]
	mov	ecx, eax
	mov	rax, qword ptr gs:[88]
	mov	rax, qword ptr [rax + 8*rcx]
	mov	al, byte ptr [rax + __tls_guard@SECREL32]
	cmp	al, 0
	jne	.LBB0_2
# %bb.1:
	call	__dyn_tls_on_demand_init
.LBB0_2:
	mov	eax, dword ptr [rip + _tls_index]
	mov	ecx, eax
	mov	rax, qword ptr gs:[88]
	mov	rax, qword ptr [rax + 8*rcx]
	lea	rax, [rax + "?a@@3PEAUA@@EA"@SECREL32]
	mov	rax, qword ptr [rax]
	add	rsp, 40
	ret

and below when targeting arm64

getA:                                   // @getA
.seh_proc getA
// %bb.0:
	str	x30, [sp, #-16]!                // 8-byte Folded Spill
	.seh_save_reg_x	x30, 16
	.seh_endprologue
	ldr	x8, [x18, #88]
	adrp	x9, _tls_index
	ldr	w9, [x9, :lo12:_tls_index]
                                        // kill: def $x9 killed $w9
	ldr	x8, [x8, x9, lsl #3]
	add	x8, x8, :secrel_hi12:__tls_guard
	ldrb	w8, [x8, :secrel_lo12:__tls_guard]
	cbnz	w8, .LBB0_2
	b	.LBB0_1
.LBB0_1:
	bl	__dyn_tls_on_demand_init
	b	.LBB0_2
.LBB0_2:
	ldr	x8, [x18, #88]
	adrp	x9, _tls_index
	ldr	w9, [x9, :lo12:_tls_index]
                                        // kill: def $x9 killed $w9
	ldr	x8, [x8, x9, lsl #3]
	add	x8, x8, :secrel_hi12:"?a@@3PEAUA@@EA"
	ldr	x0, [x8, :secrel_lo12:"?a@@3PEAUA@@EA"]
	.seh_startepilogue
	ldr	x30, [sp], #16                  // 8-byte Folded Reload
	.seh_save_reg_x	x30, 16
	.seh_endepilogue
	ret

In both cases, the base of TLS is retrieved again if __tls_guard if already true, which is set by the TLS initialization code just once and remains true thereafter. Ideally, the x64 code gen should look like the following. Arm64 code can be optimized in a similar manner.

getA:                                   # @getA
	sub	rsp, 40
	.seh_stackalloc 40
	.seh_endprologue
.LBB0_1:
	mov	eax, dword ptr [rip + _tls_index]
	mov	ecx, eax
	mov	rax, qword ptr gs:[88]
	mov	rax, qword ptr [rax + 8*rcx]
	mov	al, byte ptr [rax + __tls_guard@SECREL32]
	cmp	al, 0
	je 	.LBB0_2
	lea	rax, [rax + "?a@@3PEAUA@@EA"@SECREL32]
	mov	rax, qword ptr [rax]
	add	rsp, 40
	ret

LBB0_2:
	call	__dyn_tls_on_demand_init
        jmp    .LBB0_1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang-cl `clang-cl` driver. Don't use for other compiler parts
Projects
None yet
Development

No branches or pull requests

2 participants