[lldb] Improve unwinding for discontinuous functions (#111409)

Currently, our unwinder assumes that the functions are continuous (or at least, that there are no functions which are "in the middle" of other functions). Neither of these assumptions is true for functions optimized by tools like propeller and (probably) bolt. While there are many things that go wrong for these functions, the biggest damage is caused by the unwind plan caching code, which currently takes the maximalist extent of the function and assumes that the unwind plan we get for that is going to be valid for all code inside that range. If a part of the function has been moved into a "cold" section, then the range of the function can be many megabytes, meaning that any function within that range will probably fail to unwind. We end up with this maximalist range because the unwinder asks for the Function object for its range. This is only one of the strategies for determining the range, but it is the first one -- and also the most incorrect one. The second choice would is asking the eh_frame section for the range of the function, and this one returns something reasonable here (the address range of the current function fragment) -- which it does because each fragment gets its own eh_frame entry (it has to, because they have to be continuous). With this in mind, this patch moves the eh_frame (and debug_frame) to the front of the queue. I think that preferring this range makes sense because eh_frame is one of the unwind plans that we return, and some others (augmented eh_frame) are based on it. In theory this could break some functions, where the debug info and eh_frame disagree on the extent of the function (and eh_frame is the one who's wrong), but I don't know of any such scenarios.
llvm · Oct 14, 2024 · a89e016 · a89e016
1 parent b5600c6
commit a89e016
Show file tree

Hide file tree

Showing 6 changed files with 379 additions and 7 deletions.
diff --git a/lldb/source/Commands/CommandObjectTarget.cpp b/lldb/source/Commands/CommandObjectTarget.cpp
@@ -3583,10 +3583,12 @@ class CommandObjectTargetModulesShowUnwind : public CommandObjectParsed {
       addr_t start_addr = range.GetBaseAddress().GetLoadAddress(target);
       if (abi)
         start_addr = abi->FixCodeAddress(start_addr);
+      range.GetBaseAddress().SetLoadAddress(start_addr, target);
 
       FuncUnwindersSP func_unwinders_sp(
           sc.module_sp->GetUnwindTable()
-              .GetUncachedFuncUnwindersContainingAddress(start_addr, sc));
+              .GetUncachedFuncUnwindersContainingAddress(range.GetBaseAddress(),
+                                                         sc));
       if (!func_unwinders_sp)
         continue;
 

diff --git a/lldb/source/Symbol/UnwindTable.cpp b/lldb/source/Symbol/UnwindTable.cpp
@@ -99,12 +99,6 @@ UnwindTable::GetAddressRange(const Address &addr, const SymbolContext &sc) {
       m_object_file_unwind_up->GetAddressRange(addr, range))
     return range;
 
-  // Check the symbol context
-  if (sc.GetAddressRange(eSymbolContextFunction | eSymbolContextSymbol, 0,
-                         false, range) &&
-      range.GetBaseAddress().IsValid())
-    return range;
-
   // Does the eh_frame unwind info has a function bounds for this addr?
   if (m_eh_frame_up && m_eh_frame_up->GetAddressRange(addr, range))
     return range;
@@ -113,6 +107,12 @@ UnwindTable::GetAddressRange(const Address &addr, const SymbolContext &sc) {
   if (m_debug_frame_up && m_debug_frame_up->GetAddressRange(addr, range))
     return range;
 
+  // Check the symbol context
+  if (sc.GetAddressRange(eSymbolContextFunction | eSymbolContextSymbol, 0,
+                         false, range) &&
+      range.GetBaseAddress().IsValid())
+    return range;
+
   return std::nullopt;
 }
 

diff --git a/lldb/test/Shell/Unwind/Inputs/basic-block-sections-with-dwarf.s b/lldb/test/Shell/Unwind/Inputs/basic-block-sections-with-dwarf.s
@@ -0,0 +1,256 @@
+# An example of a function which has been split into two parts. Roughly
+# corresponds to this C code.
+# int baz() { return 47; }
+# int bar() { return foo(0); }
+# int foo(int flag) { return flag ? bar() : baz(); }
+# int main() { return foo(1); }
+# The function bar has been placed "in the middle" of foo.
+
+        .text
+
+        .type   baz,@function
+baz:
+        .cfi_startproc
+        movl    $47, %eax
+        retq
+        .cfi_endproc
+.Lbaz_end:
+        .size   baz, .Lbaz_end-baz
+
+        .type   foo,@function
+foo:
+        .cfi_startproc
+        pushq   %rbp
+        .cfi_def_cfa_offset 16
+        .cfi_offset %rbp, -16
+        movq    %rsp, %rbp
+        .cfi_def_cfa_register %rbp
+        subq    $16, %rsp
+        movl    %edi, -8(%rbp)
+        cmpl    $0, -8(%rbp)
+        je      foo.__part.2
+        jmp     foo.__part.1
+        .cfi_endproc
+.Lfoo_end:
+        .size   foo, .Lfoo_end-foo
+
+foo.__part.1:
+        .cfi_startproc
+        .cfi_def_cfa %rbp, 16
+        .cfi_offset %rbp, -16
+        callq   bar
+        movl    %eax, -4(%rbp)
+        jmp     foo.__part.3
+.Lfoo.__part.1_end:
+        .size   foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
+        .cfi_endproc
+
+bar:
+        .cfi_startproc
+# NB: Decrease the stack pointer to make the unwind info for this function
+# different from the surrounding foo function.
+        subq    $24, %rsp
+        .cfi_def_cfa_offset 32
+        xorl    %edi, %edi
+        callq   foo
+        addq    $24, %rsp
+        .cfi_def_cfa %rsp, 8
+        retq
+        .cfi_endproc
+.Lbar_end:
+        .size   bar, .Lbar_end-bar
+
+foo.__part.2:
+        .cfi_startproc
+        .cfi_def_cfa %rbp, 16
+        .cfi_offset %rbp, -16
+        callq   baz
+        movl    %eax, -4(%rbp)
+        jmp     foo.__part.3
+.Lfoo.__part.2_end:
+        .size   foo.__part.2, .Lfoo.__part.2_end-foo.__part.2
+        .cfi_endproc
+
+foo.__part.3:
+        .cfi_startproc
+        .cfi_def_cfa %rbp, 16
+        .cfi_offset %rbp, -16
+        movl    -4(%rbp), %eax
+        addq    $16, %rsp
+        popq    %rbp
+        .cfi_def_cfa %rsp, 8
+        retq
+.Lfoo.__part.3_end:
+        .size   foo.__part.3, .Lfoo.__part.3_end-foo.__part.3
+        .cfi_endproc
+
+
+        .globl  main
+        .type   main,@function
+main:
+        .cfi_startproc
+        movl    $1, %edi
+        callq   foo
+        retq
+        .cfi_endproc
+.Lmain_end:
+        .size   main, .Lmain_end-main
+
+        .section        .debug_abbrev,"",@progbits
+        .byte   1                               # Abbreviation Code
+        .byte   17                              # DW_TAG_compile_unit
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   37                              # DW_AT_producer
+        .byte   8                               # DW_FORM_string
+        .byte   19                              # DW_AT_language
+        .byte   5                               # DW_FORM_data2
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   85                              # DW_AT_ranges
+        .byte   35                              # DW_FORM_rnglistx
+        .byte   116                             # DW_AT_rnglists_base
+        .byte   23                              # DW_FORM_sec_offset
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   2                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   0                               # DW_CHILDREN_no
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   18                              # DW_AT_high_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   3                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   85                              # DW_AT_ranges
+        .byte   35                              # DW_FORM_rnglistx
+        .byte   64                              # DW_AT_frame_base
+        .byte   24                              # DW_FORM_exprloc
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   4                               # Abbreviation Code
+        .byte   5                               # DW_TAG_formal_parameter
+        .byte   0                               # DW_CHILDREN_no
+        .byte   2                               # DW_AT_location
+        .byte   24                              # DW_FORM_exprloc
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   73                              # DW_AT_type
+        .byte   19                              # DW_FORM_ref4
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   5                               # Abbreviation Code
+        .byte   36                              # DW_TAG_base_type
+        .byte   0                               # DW_CHILDREN_no
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   62                              # DW_AT_encoding
+        .byte   11                              # DW_FORM_data1
+        .byte   11                              # DW_AT_byte_size
+        .byte   11                              # DW_FORM_data1
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   0                               # EOM(3)
+
+        .section        .debug_info,"",@progbits
+.Lcu_begin0:
+        .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
+.Ldebug_info_start0:
+        .short  5                               # DWARF version number
+        .byte   1                               # DWARF Unit Type
+        .byte   8                               # Address Size (in bytes)
+        .long   .debug_abbrev                   # Offset Into Abbrev. Section
+        .byte   1                               # Abbrev [1] DW_TAG_compile_unit
+        .asciz  "Hand-written DWARF"            # DW_AT_producer
+        .short  29                              # DW_AT_language
+        .quad   0                               # DW_AT_low_pc
+        .byte   1                               # DW_AT_ranges
+        .long   .Lrnglists_table_base0          # DW_AT_rnglists_base
+        .byte   2                               # Abbrev [2] DW_TAG_subprogram
+        .quad   baz                             # DW_AT_low_pc
+        .quad   .Lbaz_end                       # DW_AT_high_pc
+        .asciz  "baz"                           # DW_AT_name
+        .byte   2                               # Abbrev [2] DW_TAG_subprogram
+        .quad   bar                             # DW_AT_low_pc
+        .quad   .Lbar_end                       # DW_AT_high_pc
+        .asciz  "bar"                           # DW_AT_name
+        .byte   3                               # Abbrev [3] DW_TAG_subprogram
+        .byte   0                               # DW_AT_ranges
+        .byte   1                               # DW_AT_frame_base
+        .byte   86
+        .asciz  "foo"                           # DW_AT_name
+        .byte   4                               # Abbrev [4] DW_TAG_formal_parameter
+        .byte   2                               # DW_AT_location
+        .byte   145
+        .byte   120
+        .asciz  "flag"                          # DW_AT_name
+        .long   .Lint-.Lcu_begin0               # DW_AT_type
+        .byte   0                               # End Of Children Mark
+        .byte   2                               # Abbrev [2] DW_TAG_subprogram
+        .quad   main                            # DW_AT_low_pc
+        .quad   .Lmain_end                      # DW_AT_high_pc
+        .asciz  "main"                          # DW_AT_name
+.Lint:
+        .byte   5                               # Abbrev [5] DW_TAG_base_type
+        .asciz  "int"                           # DW_AT_name
+        .byte   5                               # DW_AT_encoding
+        .byte   4                               # DW_AT_byte_size
+        .byte   0                               # End Of Children Mark
+.Ldebug_info_end0:
+
+        .section        .debug_rnglists,"",@progbits
+        .long   .Ldebug_list_header_end0-.Ldebug_list_header_start0 # Length
+.Ldebug_list_header_start0:
+        .short  5                               # Version
+        .byte   8                               # Address size
+        .byte   0                               # Segment selector size
+        .long   2                               # Offset entry count
+.Lrnglists_table_base0:
+        .long   .Ldebug_ranges0-.Lrnglists_table_base0
+        .long   .Ldebug_ranges1-.Lrnglists_table_base0
+.Ldebug_ranges0:
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo
+        .quad   .Lfoo_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.1
+        .quad   .Lfoo.__part.1_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.2
+        .quad   .Lfoo.__part.2_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.3
+        .quad   .Lfoo.__part.3_end
+        .byte   0                               # DW_RLE_end_of_list
+.Ldebug_ranges1:
+        .byte   6                               # DW_RLE_start_end
+        .quad   baz
+        .quad   .Lbaz_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   bar
+        .quad   .Lbar_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.1
+        .quad   .Lfoo.__part.1_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.2
+        .quad   .Lfoo.__part.2_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.3
+        .quad   .Lfoo.__part.3_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo
+        .quad   .Lfoo_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   main
+        .quad   .Lmain_end
+        .byte   0                               # DW_RLE_end_of_list
+.Ldebug_list_header_end0:
+
+        .section        ".note.GNU-stack","",@progbits
diff --git a/lldb/test/Shell/Unwind/Inputs/linux-x86_64.yaml b/lldb/test/Shell/Unwind/Inputs/linux-x86_64.yaml
@@ -0,0 +1,26 @@
+--- !minidump
+Streams:
+  - Type:            ThreadList
+    Threads:
+      - Thread Id:       0x000074DD
+        Context:         0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000B0010000000000033000000000000000000000002020100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040109600000000000100000000000000000000000000000068E7D0C8FF7F000068E7D0C8FF7F000097E6D0C8FF7F000010109600000000000000000000000000020000000000000088E4D0C8FF7F0000603FFF85C77F0000F00340000000000080E7D0C8FF7F000000000000000000000000000000000000E0034000000000007F0300000000000000000000000000000000000000000000801F0000FFFF00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF252525252525252525252525252525250000000000000000000000000000000000000000000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FF00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+        Stack:
+          Start of Memory Range: 0x00007FFFC8D0E000
+          Content:               DEADBEEFBAADF00D
+  - Type:            Exception
+    Thread ID:       0x000074DD
+    Exception Record:
+      Exception Code:  0x0000000B
+    Thread Context:  00000000
+  - Type:            SystemInfo
+    Processor Arch:  AMD64
+    Processor Level: 6
+    Processor Revision: 15876
+    Number of Processors: 40
+    Platform ID:     Linux
+    CSD Version:     'Linux 3.13.0-91-generic'
+    CPU:
+      Vendor ID:       GenuineIntel
+      Version Info:    0x00000000
+      Feature Info:    0x00000000
+...
diff --git a/lldb/test/Shell/Unwind/basic-block-sections-with-dwarf-static.test b/lldb/test/Shell/Unwind/basic-block-sections-with-dwarf-static.test
@@ -0,0 +1,65 @@
+# Test unwind info for functions which have been split into two or more parts.
+# In particular, check that the address ranges of these plans are correct, as
+# overly large ranges have caused a bug in the past.
+
+# REQUIRES: lld, target-x86_64
+
+# RUN: llvm-mc -triple=x86_64-pc-linux -filetype=obj \
+# RUN:   %S/Inputs/basic-block-sections-with-dwarf.s > %t.o
+# RUN: ld.lld %t.o -o %t
+## NB: This minidump exists only as a receptacle for the object file built
+## above. This is a workaround for the fact that "image show-unwind" does not
+## work without a Process object.
+# RUN: yaml2obj %S/Inputs/linux-x86_64.yaml > %t.core
+# RUN: %lldb -c %t.core %t -o "image load --file %t --slide 0" -s %s -o exit | \
+# RUN:   FileCheck --implicit-check-not="UNWIND PLANS" %s
+
+image show-unwind -n foo
+# CHECK: UNWIND PLANS for {{.*}}`foo
+#
+# CHECK:      Asynchronous (not restricted to call-sites) UnwindPlan is 'eh_frame CFI'
+# CHECK-NEXT: Synchronous (restricted to call-sites) UnwindPlan is 'eh_frame CFI'
+
+# CHECK:      Assembly language inspection UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from assembly insn profiling
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: no.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: yes.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: Address range of this UnwindPlan: [{{.*}}.text + 6-0x0000000000000019)
+
+# CHECK:      eh_frame UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from eh_frame CFI
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: yes.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: no.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: Address range of this UnwindPlan: [{{.*}}.text + 6-0x0000000000000019)
+# CHECK-NEXT: row[0]:    0: CFA=rsp +8 => rip=[CFA-8]
+# CHECK-NEXT: row[1]:    1: CFA=rsp+16 => rbp=[CFA-16] rip=[CFA-8]
+# CHECK-NEXT: row[2]:    4: CFA=rbp+16 => rbp=[CFA-16] rip=[CFA-8]
+# CHECK-EMPTY:
+
+image show-unwind -n foo.__part.1
+# CHECK: UNWIND PLANS for {{.*}}`foo.__part.1
+
+## As of this writing (Oct 2024), the async unwind plan is "assembly insn
+## profiling", which isn't ideal, because this "function" does not follow the
+## standard ABI. We end up choosing this plan because the eh_frame unwind plan
+## looks like the unwind plan for a regular function without the prologue
+## information.
+# CHECK: Synchronous (restricted to call-sites) UnwindPlan is 'eh_frame CFI'
+
+# CHECK:      Assembly language inspection UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from assembly insn profiling
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: no.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: yes.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: Address range of this UnwindPlan: [{{.*}}.text + 25-0x0000000000000023)
+
+# CHECK:      eh_frame UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from eh_frame CFI
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: yes.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: no.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: Address range of this UnwindPlan: [{{.*}}.text + 25-0x0000000000000023)
+# CHECK-NEXT: row[0]:    0: CFA=rbp+16 => rbp=[CFA-16] rip=[CFA-8]
+# CHECK-EMPTY: