Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb] Improve unwinding for discontinuous functions #111409

Merged
merged 1 commit into from
Oct 14, 2024
Merged

Conversation

labath
Copy link
Collaborator

@labath labath commented Oct 7, 2024

Currently, our unwinder assumes that the functions are continuous (or at least, that there are no functions which are "in the middle" of other functions). Neither of these assumptions is true for functions optimized by tools like propeller and (probably) bolt.

While there are many things that go wrong for these functions, the biggest damage is caused by the unwind plan caching code, which currently takes the maximalist extent of the function and assumes that the unwind plan we get for that is going to be valid for all code inside that range. If a part of the function has been moved into a "cold" section, then the range of the function can be many megabytes, meaning that any function within that range will probably fail to unwind.

We end up with this maximalist range because the unwinder asks for the Function object for its range. This is only one of the strategies for determining the range, but it is the first one -- and also the most incorrect one. The second choice would is asking the eh_frame section for the range of the function, and this one returns something reasonable here (the address range of the current function fragment) -- which it does because each fragment gets its own eh_frame entry (it has to, because they have to be continuous).

With this in mind, this patch moves the eh_frame (and debug_frame) to the front of the queue. I think that preferring this range makes sense because eh_frame is one of the unwind plans that we return, and some others (augmented eh_frame) are based on it. In theory this could break some functions, where the debug info and eh_frame disagree on the extent of the function (and eh_frame is the one who's wrong), but I don't know of any such scenarios.

Currently, our unwinder assumes that the functions are continuous (or at
least, that there are no functions which are "in the middle" of other
functions). Neither of these assumptions is true for functions optimized
by tools like propeller and (probably) bolt.

While there are many things that go wrong for these functions, the
biggest damage is caused by the unwind plan caching code, which
currently takes the maximalist extent of the function and assumes that
the unwind plan we get for that is going to be valid for all code inside
that range. If a part of the function has been moved into a "cold"
section, then the range of the function can be many megabytes, meaning
that any function within that range will probably fail to unwind.

We end up with this maximalist range because the unwinder asks for the
Function object for its range. This is only one of the strategies for
determining the range, but it is the first one -- and also the most
incorrect one. The second choice would is asking the eh_frame section
for the range of the function, and this one returns something reasonable
here (the address range of the current function fragment) -- which it
does because each fragment gets its own eh_frame entry (it has to,
because they have to be continuous).

With this in mind, this patch moves the eh_frame (and debug_frame) to
the front of the queue. I think that preferring this range makes sense
because eh_frame is one of the unwind plans that we return, and some
others (augmented eh_frame) are based on it. In theory this could break
some functions, where the debug info and eh_frame disagree on the
extent of the function (and eh_frame is the one who's wrong), but I
don't know of any such scenarios.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 7, 2024

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Currently, our unwinder assumes that the functions are continuous (or at least, that there are no functions which are "in the middle" of other functions). Neither of these assumptions is true for functions optimized by tools like propeller and (probably) bolt.

While there are many things that go wrong for these functions, the biggest damage is caused by the unwind plan caching code, which currently takes the maximalist extent of the function and assumes that the unwind plan we get for that is going to be valid for all code inside that range. If a part of the function has been moved into a "cold" section, then the range of the function can be many megabytes, meaning that any function within that range will probably fail to unwind.

We end up with this maximalist range because the unwinder asks for the Function object for its range. This is only one of the strategies for determining the range, but it is the first one -- and also the most incorrect one. The second choice would is asking the eh_frame section for the range of the function, and this one returns something reasonable here (the address range of the current function fragment) -- which it does because each fragment gets its own eh_frame entry (it has to, because they have to be continuous).

With this in mind, this patch moves the eh_frame (and debug_frame) to the front of the queue. I think that preferring this range makes sense because eh_frame is one of the unwind plans that we return, and some others (augmented eh_frame) are based on it. In theory this could break some functions, where the debug info and eh_frame disagree on the extent of the function (and eh_frame is the one who's wrong), but I don't know of any such scenarios.


Patch is 21.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111409.diff

6 Files Affected:

  • (modified) lldb/source/Commands/CommandObjectTarget.cpp (+3-1)
  • (modified) lldb/source/Symbol/UnwindTable.cpp (+6-6)
  • (added) lldb/test/Shell/Unwind/Inputs/basic-block-sections-with-dwarf.s (+256)
  • (added) lldb/test/Shell/Unwind/Inputs/linux-x86_64.yaml (+26)
  • (added) lldb/test/Shell/Unwind/basic-block-sections-with-dwarf-static.test (+65)
  • (added) lldb/test/Shell/Unwind/basic-block-sections-with-dwarf.test (+23)
diff --git a/lldb/source/Commands/CommandObjectTarget.cpp b/lldb/source/Commands/CommandObjectTarget.cpp
index e950fb346c253b..9a4a17eaa124ce 100644
--- a/lldb/source/Commands/CommandObjectTarget.cpp
+++ b/lldb/source/Commands/CommandObjectTarget.cpp
@@ -3583,10 +3583,12 @@ class CommandObjectTargetModulesShowUnwind : public CommandObjectParsed {
       addr_t start_addr = range.GetBaseAddress().GetLoadAddress(target);
       if (abi)
         start_addr = abi->FixCodeAddress(start_addr);
+      range.GetBaseAddress().SetLoadAddress(start_addr, target);
 
       FuncUnwindersSP func_unwinders_sp(
           sc.module_sp->GetUnwindTable()
-              .GetUncachedFuncUnwindersContainingAddress(start_addr, sc));
+              .GetUncachedFuncUnwindersContainingAddress(range.GetBaseAddress(),
+                                                         sc));
       if (!func_unwinders_sp)
         continue;
 
diff --git a/lldb/source/Symbol/UnwindTable.cpp b/lldb/source/Symbol/UnwindTable.cpp
index da88b0c9c4ea19..42ab7ba95b9e6c 100644
--- a/lldb/source/Symbol/UnwindTable.cpp
+++ b/lldb/source/Symbol/UnwindTable.cpp
@@ -99,12 +99,6 @@ UnwindTable::GetAddressRange(const Address &addr, const SymbolContext &sc) {
       m_object_file_unwind_up->GetAddressRange(addr, range))
     return range;
 
-  // Check the symbol context
-  if (sc.GetAddressRange(eSymbolContextFunction | eSymbolContextSymbol, 0,
-                         false, range) &&
-      range.GetBaseAddress().IsValid())
-    return range;
-
   // Does the eh_frame unwind info has a function bounds for this addr?
   if (m_eh_frame_up && m_eh_frame_up->GetAddressRange(addr, range))
     return range;
@@ -113,6 +107,12 @@ UnwindTable::GetAddressRange(const Address &addr, const SymbolContext &sc) {
   if (m_debug_frame_up && m_debug_frame_up->GetAddressRange(addr, range))
     return range;
 
+  // Check the symbol context
+  if (sc.GetAddressRange(eSymbolContextFunction | eSymbolContextSymbol, 0,
+                         false, range) &&
+      range.GetBaseAddress().IsValid())
+    return range;
+
   return std::nullopt;
 }
 
diff --git a/lldb/test/Shell/Unwind/Inputs/basic-block-sections-with-dwarf.s b/lldb/test/Shell/Unwind/Inputs/basic-block-sections-with-dwarf.s
new file mode 100644
index 00000000000000..c405e51c227cb6
--- /dev/null
+++ b/lldb/test/Shell/Unwind/Inputs/basic-block-sections-with-dwarf.s
@@ -0,0 +1,256 @@
+# An example of a function which has been split into two parts. Roughly
+# corresponds to this C code.
+# int baz() { return 47; }
+# int bar() { return foo(0); }
+# int foo(int flag) { return flag ? bar() : baz(); }
+# int main() { return foo(1); }
+# The function bar has been placed "in the middle" of foo.
+
+        .text
+
+        .type   baz,@function
+baz:
+        .cfi_startproc
+        movl    $47, %eax
+        retq
+        .cfi_endproc
+.Lbaz_end:
+        .size   baz, .Lbaz_end-baz
+
+        .type   foo,@function
+foo:
+        .cfi_startproc
+        pushq   %rbp
+        .cfi_def_cfa_offset 16
+        .cfi_offset %rbp, -16
+        movq    %rsp, %rbp
+        .cfi_def_cfa_register %rbp
+        subq    $16, %rsp
+        movl    %edi, -8(%rbp)
+        cmpl    $0, -8(%rbp)
+        je      foo.__part.2
+        jmp     foo.__part.1
+        .cfi_endproc
+.Lfoo_end:
+        .size   foo, .Lfoo_end-foo
+
+foo.__part.1:
+        .cfi_startproc
+        .cfi_def_cfa %rbp, 16
+        .cfi_offset %rbp, -16
+        callq   bar
+        movl    %eax, -4(%rbp)
+        jmp     foo.__part.3
+.Lfoo.__part.1_end:
+        .size   foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
+        .cfi_endproc
+
+bar:
+        .cfi_startproc
+# NB: Decrease the stack pointer to make the unwind info for this function
+# different from the surrounding foo function.
+        subq    $24, %rsp
+        .cfi_def_cfa_offset 32
+        xorl    %edi, %edi
+        callq   foo
+        addq    $24, %rsp
+        .cfi_def_cfa %rsp, 8
+        retq
+        .cfi_endproc
+.Lbar_end:
+        .size   bar, .Lbar_end-bar
+
+foo.__part.2:
+        .cfi_startproc
+        .cfi_def_cfa %rbp, 16
+        .cfi_offset %rbp, -16
+        callq   baz
+        movl    %eax, -4(%rbp)
+        jmp     foo.__part.3
+.Lfoo.__part.2_end:
+        .size   foo.__part.2, .Lfoo.__part.2_end-foo.__part.2
+        .cfi_endproc
+
+foo.__part.3:
+        .cfi_startproc
+        .cfi_def_cfa %rbp, 16
+        .cfi_offset %rbp, -16
+        movl    -4(%rbp), %eax
+        addq    $16, %rsp
+        popq    %rbp
+        .cfi_def_cfa %rsp, 8
+        retq
+.Lfoo.__part.3_end:
+        .size   foo.__part.3, .Lfoo.__part.3_end-foo.__part.3
+        .cfi_endproc
+
+
+        .globl  main
+        .type   main,@function
+main:
+        .cfi_startproc
+        movl    $1, %edi
+        callq   foo
+        retq
+        .cfi_endproc
+.Lmain_end:
+        .size   main, .Lmain_end-main
+
+        .section        .debug_abbrev,"",@progbits
+        .byte   1                               # Abbreviation Code
+        .byte   17                              # DW_TAG_compile_unit
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   37                              # DW_AT_producer
+        .byte   8                               # DW_FORM_string
+        .byte   19                              # DW_AT_language
+        .byte   5                               # DW_FORM_data2
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   85                              # DW_AT_ranges
+        .byte   35                              # DW_FORM_rnglistx
+        .byte   116                             # DW_AT_rnglists_base
+        .byte   23                              # DW_FORM_sec_offset
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   2                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   0                               # DW_CHILDREN_no
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   18                              # DW_AT_high_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   3                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   85                              # DW_AT_ranges
+        .byte   35                              # DW_FORM_rnglistx
+        .byte   64                              # DW_AT_frame_base
+        .byte   24                              # DW_FORM_exprloc
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   4                               # Abbreviation Code
+        .byte   5                               # DW_TAG_formal_parameter
+        .byte   0                               # DW_CHILDREN_no
+        .byte   2                               # DW_AT_location
+        .byte   24                              # DW_FORM_exprloc
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   73                              # DW_AT_type
+        .byte   19                              # DW_FORM_ref4
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   5                               # Abbreviation Code
+        .byte   36                              # DW_TAG_base_type
+        .byte   0                               # DW_CHILDREN_no
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   62                              # DW_AT_encoding
+        .byte   11                              # DW_FORM_data1
+        .byte   11                              # DW_AT_byte_size
+        .byte   11                              # DW_FORM_data1
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   0                               # EOM(3)
+
+        .section        .debug_info,"",@progbits
+.Lcu_begin0:
+        .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
+.Ldebug_info_start0:
+        .short  5                               # DWARF version number
+        .byte   1                               # DWARF Unit Type
+        .byte   8                               # Address Size (in bytes)
+        .long   .debug_abbrev                   # Offset Into Abbrev. Section
+        .byte   1                               # Abbrev [1] DW_TAG_compile_unit
+        .asciz  "Hand-written DWARF"            # DW_AT_producer
+        .short  29                              # DW_AT_language
+        .quad   0                               # DW_AT_low_pc
+        .byte   1                               # DW_AT_ranges
+        .long   .Lrnglists_table_base0          # DW_AT_rnglists_base
+        .byte   2                               # Abbrev [2] DW_TAG_subprogram
+        .quad   baz                             # DW_AT_low_pc
+        .quad   .Lbaz_end                       # DW_AT_high_pc
+        .asciz  "baz"                           # DW_AT_name
+        .byte   2                               # Abbrev [2] DW_TAG_subprogram
+        .quad   bar                             # DW_AT_low_pc
+        .quad   .Lbar_end                       # DW_AT_high_pc
+        .asciz  "bar"                           # DW_AT_name
+        .byte   3                               # Abbrev [3] DW_TAG_subprogram
+        .byte   0                               # DW_AT_ranges
+        .byte   1                               # DW_AT_frame_base
+        .byte   86
+        .asciz  "foo"                           # DW_AT_name
+        .byte   4                               # Abbrev [4] DW_TAG_formal_parameter
+        .byte   2                               # DW_AT_location
+        .byte   145
+        .byte   120
+        .asciz  "flag"                          # DW_AT_name
+        .long   .Lint-.Lcu_begin0               # DW_AT_type
+        .byte   0                               # End Of Children Mark
+        .byte   2                               # Abbrev [2] DW_TAG_subprogram
+        .quad   main                            # DW_AT_low_pc
+        .quad   .Lmain_end                      # DW_AT_high_pc
+        .asciz  "main"                          # DW_AT_name
+.Lint:
+        .byte   5                               # Abbrev [5] DW_TAG_base_type
+        .asciz  "int"                           # DW_AT_name
+        .byte   5                               # DW_AT_encoding
+        .byte   4                               # DW_AT_byte_size
+        .byte   0                               # End Of Children Mark
+.Ldebug_info_end0:
+
+        .section        .debug_rnglists,"",@progbits
+        .long   .Ldebug_list_header_end0-.Ldebug_list_header_start0 # Length
+.Ldebug_list_header_start0:
+        .short  5                               # Version
+        .byte   8                               # Address size
+        .byte   0                               # Segment selector size
+        .long   2                               # Offset entry count
+.Lrnglists_table_base0:
+        .long   .Ldebug_ranges0-.Lrnglists_table_base0
+        .long   .Ldebug_ranges1-.Lrnglists_table_base0
+.Ldebug_ranges0:
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo
+        .quad   .Lfoo_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.1
+        .quad   .Lfoo.__part.1_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.2
+        .quad   .Lfoo.__part.2_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.3
+        .quad   .Lfoo.__part.3_end
+        .byte   0                               # DW_RLE_end_of_list
+.Ldebug_ranges1:
+        .byte   6                               # DW_RLE_start_end
+        .quad   baz
+        .quad   .Lbaz_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   bar
+        .quad   .Lbar_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.1
+        .quad   .Lfoo.__part.1_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.2
+        .quad   .Lfoo.__part.2_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo.__part.3
+        .quad   .Lfoo.__part.3_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   foo
+        .quad   .Lfoo_end
+        .byte   6                               # DW_RLE_start_end
+        .quad   main
+        .quad   .Lmain_end
+        .byte   0                               # DW_RLE_end_of_list
+.Ldebug_list_header_end0:
+
+        .section        ".note.GNU-stack","",@progbits
diff --git a/lldb/test/Shell/Unwind/Inputs/linux-x86_64.yaml b/lldb/test/Shell/Unwind/Inputs/linux-x86_64.yaml
new file mode 100644
index 00000000000000..987462c2a0efc6
--- /dev/null
+++ b/lldb/test/Shell/Unwind/Inputs/linux-x86_64.yaml
@@ -0,0 +1,26 @@
+--- !minidump
+Streams:
+  - Type:            ThreadList
+    Threads:
+      - Thread Id:       0x000074DD
+        Context:         0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000B0010000000000033000000000000000000000002020100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040109600000000000100000000000000000000000000000068E7D0C8FF7F000068E7D0C8FF7F000097E6D0C8FF7F000010109600000000000000000000000000020000000000000088E4D0C8FF7F0000603FFF85C77F0000F00340000000000080E7D0C8FF7F000000000000000000000000000000000000E0034000000000007F0300000000000000000000000000000000000000000000801F0000FFFF00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF252525252525252525252525252525250000000000000000000000000000000000000000000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FF00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+        Stack:
+          Start of Memory Range: 0x00007FFFC8D0E000
+          Content:               DEADBEEFBAADF00D
+  - Type:            Exception
+    Thread ID:       0x000074DD
+    Exception Record:
+      Exception Code:  0x0000000B
+    Thread Context:  00000000
+  - Type:            SystemInfo
+    Processor Arch:  AMD64
+    Processor Level: 6
+    Processor Revision: 15876
+    Number of Processors: 40
+    Platform ID:     Linux
+    CSD Version:     'Linux 3.13.0-91-generic'
+    CPU:
+      Vendor ID:       GenuineIntel
+      Version Info:    0x00000000
+      Feature Info:    0x00000000
+...
diff --git a/lldb/test/Shell/Unwind/basic-block-sections-with-dwarf-static.test b/lldb/test/Shell/Unwind/basic-block-sections-with-dwarf-static.test
new file mode 100644
index 00000000000000..09ea0203e3eeae
--- /dev/null
+++ b/lldb/test/Shell/Unwind/basic-block-sections-with-dwarf-static.test
@@ -0,0 +1,65 @@
+# Test unwind info for functions which have been split into two or more parts.
+# In particular, check that the address ranges of these plans are correct, as
+# overly large ranges have caused a bug in the past.
+
+# REQUIRES: lld, target-x86_64
+
+# RUN: llvm-mc -triple=x86_64-pc-linux -filetype=obj \
+# RUN:   %S/Inputs/basic-block-sections-with-dwarf.s > %t.o
+# RUN: ld.lld %t.o -o %t
+## NB: This minidump exists only as a receptacle for the object file built
+## above. This is a workaround for the fact that "image show-unwind" does not
+## work without a Process object.
+# RUN: yaml2obj %S/Inputs/linux-x86_64.yaml > %t.core
+# RUN: %lldb -c %t.core %t -o "image load --file %t --slide 0" -s %s -o exit | \
+# RUN:   FileCheck --implicit-check-not="UNWIND PLANS" %s
+
+image show-unwind -n foo
+# CHECK: UNWIND PLANS for {{.*}}`foo
+#
+# CHECK:      Asynchronous (not restricted to call-sites) UnwindPlan is 'eh_frame CFI'
+# CHECK-NEXT: Synchronous (restricted to call-sites) UnwindPlan is 'eh_frame CFI'
+
+# CHECK:      Assembly language inspection UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from assembly insn profiling
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: no.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: yes.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: Address range of this UnwindPlan: [{{.*}}.text + 6-0x0000000000000019)
+
+# CHECK:      eh_frame UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from eh_frame CFI
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: yes.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: no.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: Address range of this UnwindPlan: [{{.*}}.text + 6-0x0000000000000019)
+# CHECK-NEXT: row[0]:    0: CFA=rsp +8 => rip=[CFA-8]
+# CHECK-NEXT: row[1]:    1: CFA=rsp+16 => rbp=[CFA-16] rip=[CFA-8]
+# CHECK-NEXT: row[2]:    4: CFA=rbp+16 => rbp=[CFA-16] rip=[CFA-8]
+# CHECK-EMPTY:
+
+image show-unwind -n foo.__part.1
+# CHECK: UNWIND PLANS for {{.*}}`foo.__part.1
+
+## As of this writing (Oct 2024), the async unwind plan is "assembly insn
+## profiling", which isn't ideal, because this "function" does not follow the
+## standard ABI. We end up choosing this plan because the eh_frame unwind plan
+## looks like the unwind plan for a regular function without the prologue
+## information.
+# CHECK: Synchronous (restricted to call-sites) UnwindPlan is 'eh_frame CFI'
+
+# CHECK:      Assembly language inspection UnwindPlan:
+# CHECK-NEXT: This UnwindPlan originally sourced from assembly insn profiling
+# CHECK-NEXT: This UnwindPlan is sourced from the compiler: no.
+# CHECK-NEXT: This UnwindPlan is valid at all instruction locations: yes.
+# CHECK-NEXT: This UnwindPlan is for a trap handler function: no.
+# CHECK-NEXT: ...
[truncated]


FuncUnwindersSP func_unwinders_sp(
sc.module_sp->GetUnwindTable()
.GetUncachedFuncUnwindersContainingAddress(start_addr, sc));
.GetUncachedFuncUnwindersContainingAddress(range.GetBaseAddress(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because image show-unwind wouldn't use the eh_frame address range without it (it requires a resolved address).

BTW. This would have been easier to investigate (and test) if this command used the actual cached unwind plans instead of creating them from scratch. This way I was left puzzled as to unwind went off track even though the unwind plans looked (mostly) reasonable. What would you say to:
a) adding an option to use/print the cached plans
b) making this option the default
c) deleting the non-cached option entirely?

@jasonmolenda
Copy link
Collaborator

This is interesting, sorry the uncached unwindplans made it harder to debug.

I've seen cold function outlining in our libraries but I've never looked at the generated DWARF - does the function die have a location list with the main function body and the .cold.1 function body as entries, discontiguous? I'll try to scrub around and find one of ours and see what our DWARF looks like tomorrow on a Darwin binary that does this, curious now.

From a symbol table point of view, if the symbol names haven't been stripped, the function symbol will have the range of the main function body only; the .cold.1 function would be a separate Symbol. In a stripped binary (no symbol name), I know ObjectFileMachO will use the eh_frame start addresses to create fake symbol table names & entries, I don't know if ObjectFileELF does that, but in that case a Symbol would be equivalent to eh_frame as a source of information.

As for why the image show-unwind is using the uncached version, it's a little hack to aid in debugging. You can turn on the verbose unwind log and do a show-unwind to see all the logging from UnwindAssemblyInstEmulation as it goes over the instructions. With the cached version, it wouldn't re-do this work so you'd miss it unless you turned on the verbose log at the beginning of the process execution (and skipped a lot of other functions that you're not interested in). You can adjust the logging level and re-do the command and it re-calculates it again, it's a bit handy like this.

I'd like to take a peek at the DWARF for a binary we have using this codegen out of curiosity, but the patch looks fine to me. It's clearly not great that this is this fragile, and I'd like to reconsider how these ranges are calculated so it's clearer what is intended. I wonder if the Dwarin debug info reflects these as separate functions, or maybe I've just gotten lucky that I never looked too closely at one.

@jasonmolenda
Copy link
Collaborator

Let me just take a peek at some Darwin binaries with this codegen & their debuginfo, and then i'll approve unless there's something I've misunderstood about what we're looking at. It really makes me think that we are going to need discontiguous ranges in the unwind plans to properly handle this kind of program.

@labath
Copy link
Collaborator Author

labath commented Oct 8, 2024

Thanks for the quick response, Jason.

I think it's quite possible that you haven't run into this situation before, because the final representation depends on what tool was used to split the functions. Judging by the name (cold.1), I think you're using the llvm "hot-cold-split" pass -- which does not generate this kind of output. Instead it creates a separate function, with its own dwarf description (DW_TAG_subprogram) and everything. This is good for unwinding, in that the "functions" stay continuous, but maybe not so good for other things (FWICS, the synthetic DW_TAG_subprogram does not contain any variable information).

The thing that produces this output is the -fbasic-block-sections flag (a.k.a "propeller"). Here we have a single DW_TAG_subprogram, which has a DW_AT_ranges attribute which all of the (discontinuous) parts of the function. However, this flag is pretty new, and does not work on darwin yet (probably because noone implemented it there). In principle, I don't think the propeller is doing anything wrong (DW_AT_ranges exists so that we could describe situations like this), but it does break some assumptions in lldb.

The problem begins in lldb_private::Function, which assumes that a single address range (m_address_range) is enough to describe it. The variable contains a comment ("The function address range that covers the widest range needed to contain all blocks") which could be interpreted to mean that one should expect it to also contain some unrelated code, but I don't know if that was the intention, and it's definitely not how the unwinder uses this information.

I've considered (and that's something I'd like to do independently of this patch) changing the lldb_private::Function interface to vend discontinuous ranges, but that still wouldn't directly help the unwinder, as we'd need to handle the discontinuity there as well. What it would allow us is to handle this situation with more finesse. We could e.g. check whether the function contains more than one address range, and then choose which range (and from which source) to use for caching. I think this is a viable path forward if you think this change is too broad.

From a symbol table point of view, if the symbol names haven't been stripped, the function symbol will have the range of the main function body only; the .cold.1 function would be a separate Symbol. In a stripped binary (no symbol name), I know ObjectFileMachO will use the eh_frame start addresses to create fake symbol table names & entries, I don't know if ObjectFileELF does that, but in that case a Symbol would be equivalent to eh_frame as a source of information.

ObjectFileELF does that too. The problems begin only when debug info is present because lldb_private::Function will claim the maximal range, as I've described above. However, this creation of synthetic symbols from unwind info is perhaps a good reason for why changing the order of range sources is safe(ish). (Although it could be a reason for not querying the eh_frame for range information at all)

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the background here, I can see what a problem a function with discontiguous ranges can be for some of our data structures. Looking at the assembly in your test case (which is going to be really helpful if we try to tackle this more accurately in the future), I see that the instruction scanner is also going to fail for the cold parts of the functions, which are tail-called (JMP'ed) to, if it tried to analyze the prologue setup based on the instructions in them. The eh_frame treats them as separate functions and hard-codes the correct stack setup from the base function in the cold blocks of code, that's one way to cheat this kind of thing, treating them as separate functions that have only contiguous address ranges.

This looks good to me for now, we'll probably need to handle this more thoroughly in the future not only in the unwind plans. :/

@labath
Copy link
Collaborator Author

labath commented Oct 14, 2024

Yeah, I plan to look into Function::GetAddressRange and see how we can make it do something reasonable for these kinds of functions (probably by having it return a list of ranges). That will likely involve looking at the users of that API as well, although I'm less than enthusiastic about making this work with the instruction scanner -- the reason for that (at least for our use case) I don't see much use for it. We already have tools (like profilers) which rely on the correctness of unwind info (even in the async case), so I don't think we gain much by using it. Clang already produces correct eh_frame for these discontinuous functions (for x86 at least, we're in the process of fixing aarch64), and hand-written functions are not likely to be split, and even if they are, unless the user writes correct debug info for them, we won't be able to detect that they are split. At that point the user might as well write the eh_frame section as well...

@labath labath merged commit a89e016 into llvm:main Oct 14, 2024
10 checks passed
@labath labath deleted the unwind branch October 14, 2024 16:56
@rastogishubham
Copy link
Contributor

Hi this patch broke Unwind/trap_frame_sym_ctx.test in greendragon, you can see the link to the failing build here:
https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake/6724/consoleFull#-1880437047d6fdb6cb-f376-4f2e-8bce-d31c7304698b

I will be reverting this change to make sure green dragon is up and running again

rastogishubham added a commit that referenced this pull request Oct 14, 2024
This reverts commit a89e016.

This is being reverted because it broke the test:

Unwind/trap_frame_sym_ctx.test

/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test:21:10: error: CHECK: expected string not found in input
 CHECK: frame #2: {{.*}}`main
@labath
Copy link
Collaborator Author

labath commented Oct 15, 2024

I'm sorry for breaking the bot. I'm afraid I'm going to need some help to understand to problem though. This test is x86 specific, and I don't have access to an x86 mac machine (only arm), and the test works fine on linux. Could you (or whoever is responsible for that bot) provide some details on the failure? The undwind log (log enable -v lldb unwind) of the test before and after the patch would probably be enough to understand what's going on.

@jasonmolenda
Copy link
Collaborator

Alex had a PR a month ago #106791 that got reverted from the same test failing on x86, I tried to repo the failure a few times on an intel mac and never succeeded. I'll try again with your patch when I have that set up. If this test didn't start/stop failing so clearly with these commits, I'd be very dubious of the test itself, to be honest.

@labath
Copy link
Collaborator Author

labath commented Oct 16, 2024

Thanks Jason. I have no doubt that the failure was triggered by this patch (the test tests unwinding, the patch touches unwinding), but the mechanism itself is very unclear to me. The functions in that patch have a valid eh_frame (and they don't have debug info), so I would expect that a change like this would be a noop in this case. I suspect there's something subtle going on, and that this could very well be a bug in some other lldb component, but it's hard tell what that is without more info. It could also easily depend on some system code that makes its way into the debugged process (process startup code and stuff). If you're unable to reproduce this on your machine, maybe we'll have to modify the test to dump loads of extra output to make sense of it.

DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
Currently, our unwinder assumes that the functions are continuous (or at
least, that there are no functions which are "in the middle" of other
functions). Neither of these assumptions is true for functions optimized
by tools like propeller and (probably) bolt.

While there are many things that go wrong for these functions, the
biggest damage is caused by the unwind plan caching code, which
currently takes the maximalist extent of the function and assumes that
the unwind plan we get for that is going to be valid for all code inside
that range. If a part of the function has been moved into a "cold"
section, then the range of the function can be many megabytes, meaning
that any function within that range will probably fail to unwind.

We end up with this maximalist range because the unwinder asks for the
Function object for its range. This is only one of the strategies for
determining the range, but it is the first one -- and also the most
incorrect one. The second choice would is asking the eh_frame section
for the range of the function, and this one returns something reasonable
here (the address range of the current function fragment) -- which it
does because each fragment gets its own eh_frame entry (it has to,
because they have to be continuous).

With this in mind, this patch moves the eh_frame (and debug_frame) to
the front of the queue. I think that preferring this range makes sense
because eh_frame is one of the unwind plans that we return, and some
others (augmented eh_frame) are based on it. In theory this could break
some functions, where the debug info and eh_frame disagree on the extent
of the function (and eh_frame is the one who's wrong), but I don't know
of any such scenarios.
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
…1409)"

This reverts commit a89e016.

This is being reverted because it broke the test:

Unwind/trap_frame_sym_ctx.test

/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test:21:10: error: CHECK: expected string not found in input
 CHECK: frame llvm#2: {{.*}}`main
@jasonmolenda
Copy link
Collaborator

I was able to build github main on an Intel mac running the current OS, and the shell tests all pass with it. I had the same difficulty trying to repo a failure with this test with Alex's change, I could never get the failure the CI bots were hitting, but I didn't have the exact same macOS version as the CI bots installed, shrug maybe that's important. I'll try to figure that out and install it on an intel mac and try to repo that way, I really wanted to get back to Alex's change too.

@jasonmolenda
Copy link
Collaborator

(wrote that update too quickly while running out the door -- I can run the lldb-shell tests both with and without your patch, and they pass. I'm not able to repo the failure the CI bots saw.)

bricknerb pushed a commit to bricknerb/llvm-project that referenced this pull request Oct 17, 2024
Currently, our unwinder assumes that the functions are continuous (or at
least, that there are no functions which are "in the middle" of other
functions). Neither of these assumptions is true for functions optimized
by tools like propeller and (probably) bolt.

While there are many things that go wrong for these functions, the
biggest damage is caused by the unwind plan caching code, which
currently takes the maximalist extent of the function and assumes that
the unwind plan we get for that is going to be valid for all code inside
that range. If a part of the function has been moved into a "cold"
section, then the range of the function can be many megabytes, meaning
that any function within that range will probably fail to unwind.

We end up with this maximalist range because the unwinder asks for the
Function object for its range. This is only one of the strategies for
determining the range, but it is the first one -- and also the most
incorrect one. The second choice would is asking the eh_frame section
for the range of the function, and this one returns something reasonable
here (the address range of the current function fragment) -- which it
does because each fragment gets its own eh_frame entry (it has to,
because they have to be continuous).

With this in mind, this patch moves the eh_frame (and debug_frame) to
the front of the queue. I think that preferring this range makes sense
because eh_frame is one of the unwind plans that we return, and some
others (augmented eh_frame) are based on it. In theory this could break
some functions, where the debug info and eh_frame disagree on the extent
of the function (and eh_frame is the one who's wrong), but I don't know
of any such scenarios.
bricknerb pushed a commit to bricknerb/llvm-project that referenced this pull request Oct 17, 2024
…1409)"

This reverts commit a89e016.

This is being reverted because it broke the test:

Unwind/trap_frame_sym_ctx.test

/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test:21:10: error: CHECK: expected string not found in input
 CHECK: frame llvm#2: {{.*}}`main
@jasonmolenda
Copy link
Collaborator

FTR I have an intel mac running the same OS as the CI bots (LLVM host triple: x86_64-apple-darwin22.6.0 it's macOS 13.5 aka macOS Ventura from 2022), and am building github main so I can try to repo this failure and Alex's on this shell test. I don't really think I'm going to repo it, I haven't with any other macOS version. I won't be able to look at it until next week but it's as close to the CI bots as I can get, hopefully this repos the fail.

@labath
Copy link
Collaborator Author

labath commented Oct 18, 2024

Thank you for looking into this.

EricWF pushed a commit to efcs/llvm-project that referenced this pull request Oct 22, 2024
Currently, our unwinder assumes that the functions are continuous (or at
least, that there are no functions which are "in the middle" of other
functions). Neither of these assumptions is true for functions optimized
by tools like propeller and (probably) bolt.

While there are many things that go wrong for these functions, the
biggest damage is caused by the unwind plan caching code, which
currently takes the maximalist extent of the function and assumes that
the unwind plan we get for that is going to be valid for all code inside
that range. If a part of the function has been moved into a "cold"
section, then the range of the function can be many megabytes, meaning
that any function within that range will probably fail to unwind.

We end up with this maximalist range because the unwinder asks for the
Function object for its range. This is only one of the strategies for
determining the range, but it is the first one -- and also the most
incorrect one. The second choice would is asking the eh_frame section
for the range of the function, and this one returns something reasonable
here (the address range of the current function fragment) -- which it
does because each fragment gets its own eh_frame entry (it has to,
because they have to be continuous).

With this in mind, this patch moves the eh_frame (and debug_frame) to
the front of the queue. I think that preferring this range makes sense
because eh_frame is one of the unwind plans that we return, and some
others (augmented eh_frame) are based on it. In theory this could break
some functions, where the debug info and eh_frame disagree on the extent
of the function (and eh_frame is the one who's wrong), but I don't know
of any such scenarios.
EricWF pushed a commit to efcs/llvm-project that referenced this pull request Oct 22, 2024
…1409)"

This reverts commit a89e016.

This is being reverted because it broke the test:

Unwind/trap_frame_sym_ctx.test

/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test:21:10: error: CHECK: expected string not found in input
 CHECK: frame #2: {{.*}}`main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants