Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb][ELF] Return address class map changes from symbol table parsing methods #91585

Merged
merged 2 commits into from
May 10, 2024

Conversation

DavidSpickett
Copy link
Collaborator

@DavidSpickett DavidSpickett commented May 9, 2024

Instead of updating the member of the ObjectFileELF instance. This means that if one object file asks another to parse the symbol table, that first object's can update its address class map with the same changes that the other object did.

(I'm not returning a reference to the other object's m_address_class_map member because there may be other things in there not related to the symbol table being parsed)

This will fix the code added in
#90622 which broke the test Expr/TestStringLiteralExpr.test on 32 bit Arm Linux.

This happened because we had the program file, then asked for a better object file, which returned the same program file again. This creates a second ObjectFileELF for the same file, so when we tell the second instance to parse the symbol table it actually calls into the first instance, leaving the address class map of the second instance empty.

Which caused us to put an Arm breakpoint instuction at a Thumb return address and broke the ability to call mmap.

Instead of updating the member of the ObjectFileELF instance.
This means that if one object file asks another to parse the symbol table,
that first object's address class map is not left empty, which would
break Arm and MIPS targets that need this information.

This will fix the code added in
llvm#90622 which broke
the test `Expr/TestStringLiteralExpr.test` on 32 bit Arm Linux.

This happened because we had the program file, then asked for a better
object file, which returned the same program file again. This creates
a second ObjectFileELF for the same file, so when we tell the second
instance to parse the symbol table it actually calls into the first
instance, leaving the address class map of the second instance empty.

Which caused us to put an Arm breakpoint instuction at a Thumb
return address and broke the ability to call mmap.
@llvmbot
Copy link
Collaborator

llvmbot commented May 9, 2024

@llvm/pr-subscribers-lldb

Author: David Spickett (DavidSpickett)

Changes

Instead of updating the member of the ObjectFileELF instance. This means that if one object file asks another to parse the symbol table, that first object's address class map is not left empty, which would break Arm and MIPS targets that need this information.

This will fix the code added in
#90622 which broke the test Expr/TestStringLiteralExpr.test on 32 bit Arm Linux.

This happened because we had the program file, then asked for a better object file, which returned the same program file again. This creates a second ObjectFileELF for the same file, so when we tell the second instance to parse the symbol table it actually calls into the first instance, leaving the address class map of the second instance empty.

Which caused us to put an Arm breakpoint instuction at a Thumb return address and broke the ability to call mmap.


Full diff: https://github.com/llvm/llvm-project/pull/91585.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp (+40-28)
  • (modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h (+11-10)
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 1646ee9aa34a6..c216cc2d054c8 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -2060,13 +2060,14 @@ static char FindArmAarch64MappingSymbol(const char *symbol_name) {
 #define IS_MICROMIPS(ST_OTHER) (((ST_OTHER)&STO_MIPS_ISA) == STO_MICROMIPS)
 
 // private
-unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
-                                     SectionList *section_list,
-                                     const size_t num_symbols,
-                                     const DataExtractor &symtab_data,
-                                     const DataExtractor &strtab_data) {
+std::pair<unsigned, ObjectFileELF::FileAddressToAddressClassMap>
+ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
+                            SectionList *section_list, const size_t num_symbols,
+                            const DataExtractor &symtab_data,
+                            const DataExtractor &strtab_data) {
   ELFSymbol symbol;
   lldb::offset_t offset = 0;
+  FileAddressToAddressClassMap address_class_map;
 
   static ConstString text_section_name(".text");
   static ConstString init_section_name(".init");
@@ -2213,18 +2214,18 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
             switch (mapping_symbol) {
             case 'a':
               // $a[.<any>]* - marks an ARM instruction sequence
-              m_address_class_map[symbol.st_value] = AddressClass::eCode;
+              address_class_map[symbol.st_value] = AddressClass::eCode;
               break;
             case 'b':
             case 't':
               // $b[.<any>]* - marks a THUMB BL instruction sequence
               // $t[.<any>]* - marks a THUMB instruction sequence
-              m_address_class_map[symbol.st_value] =
+              address_class_map[symbol.st_value] =
                   AddressClass::eCodeAlternateISA;
               break;
             case 'd':
               // $d[.<any>]* - marks a data item sequence (e.g. lit pool)
-              m_address_class_map[symbol.st_value] = AddressClass::eData;
+              address_class_map[symbol.st_value] = AddressClass::eData;
               break;
             }
           }
@@ -2238,11 +2239,11 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
             switch (mapping_symbol) {
             case 'x':
               // $x[.<any>]* - marks an A64 instruction sequence
-              m_address_class_map[symbol.st_value] = AddressClass::eCode;
+              address_class_map[symbol.st_value] = AddressClass::eCode;
               break;
             case 'd':
               // $d[.<any>]* - marks a data item sequence (e.g. lit pool)
-              m_address_class_map[symbol.st_value] = AddressClass::eData;
+              address_class_map[symbol.st_value] = AddressClass::eData;
               break;
             }
           }
@@ -2260,11 +2261,11 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
             // conjunction with symbol.st_value to produce the final
             // symbol_value that we store in the symtab.
             symbol_value_offset = -1;
-            m_address_class_map[symbol.st_value ^ 1] =
+            address_class_map[symbol.st_value ^ 1] =
                 AddressClass::eCodeAlternateISA;
           } else {
             // This address is ARM
-            m_address_class_map[symbol.st_value] = AddressClass::eCode;
+            address_class_map[symbol.st_value] = AddressClass::eCode;
           }
         }
       }
@@ -2285,17 +2286,17 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
       */
       if (arch.IsMIPS()) {
         if (IS_MICROMIPS(symbol.st_other))
-          m_address_class_map[symbol.st_value] = AddressClass::eCodeAlternateISA;
+          address_class_map[symbol.st_value] = AddressClass::eCodeAlternateISA;
         else if ((symbol.st_value & 1) && (symbol_type == eSymbolTypeCode)) {
           symbol.st_value = symbol.st_value & (~1ull);
-          m_address_class_map[symbol.st_value] = AddressClass::eCodeAlternateISA;
+          address_class_map[symbol.st_value] = AddressClass::eCodeAlternateISA;
         } else {
           if (symbol_type == eSymbolTypeCode)
-            m_address_class_map[symbol.st_value] = AddressClass::eCode;
+            address_class_map[symbol.st_value] = AddressClass::eCode;
           else if (symbol_type == eSymbolTypeData)
-            m_address_class_map[symbol.st_value] = AddressClass::eData;
+            address_class_map[symbol.st_value] = AddressClass::eData;
           else
-            m_address_class_map[symbol.st_value] = AddressClass::eUnknown;
+            address_class_map[symbol.st_value] = AddressClass::eUnknown;
         }
       }
     }
@@ -2392,24 +2393,27 @@ unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
       dc_symbol.SetIsWeak(true);
     symtab->AddSymbol(dc_symbol);
   }
-  return i;
+  return {i, address_class_map};
 }
 
-unsigned ObjectFileELF::ParseSymbolTable(Symtab *symbol_table,
-                                         user_id_t start_id,
-                                         lldb_private::Section *symtab) {
+std::pair<unsigned, ObjectFileELF::FileAddressToAddressClassMap>
+ObjectFileELF::ParseSymbolTable(Symtab *symbol_table, user_id_t start_id,
+                                lldb_private::Section *symtab) {
   if (symtab->GetObjectFile() != this) {
     // If the symbol table section is owned by a different object file, have it
     // do the parsing.
     ObjectFileELF *obj_file_elf =
         static_cast<ObjectFileELF *>(symtab->GetObjectFile());
-    return obj_file_elf->ParseSymbolTable(symbol_table, start_id, symtab);
+    auto [num_symbols, address_class_map] =
+        obj_file_elf->ParseSymbolTable(symbol_table, start_id, symtab);
+    m_address_class_map.merge(address_class_map);
+    return {num_symbols, address_class_map};
   }
 
   // Get section list for this object file.
   SectionList *section_list = m_sections_up.get();
   if (!section_list)
-    return 0;
+    return {};
 
   user_id_t symtab_id = symtab->GetID();
   const ELFSectionHeaderInfo *symtab_hdr = GetSectionHeaderByIndex(symtab_id);
@@ -2435,7 +2439,7 @@ unsigned ObjectFileELF::ParseSymbolTable(Symtab *symbol_table,
     }
   }
 
-  return 0;
+  return {0, {}};
 }
 
 size_t ObjectFileELF::ParseDynamicSymbols() {
@@ -2972,8 +2976,12 @@ void ObjectFileELF::ParseSymtab(Symtab &lldb_symtab) {
   // while the reverse is not necessarily true.
   Section *symtab =
       section_list->FindSectionByType(eSectionTypeELFSymbolTable, true).get();
-  if (symtab)
-    symbol_id += ParseSymbolTable(&lldb_symtab, symbol_id, symtab);
+  if (symtab) {
+    auto [num_symbols, address_class_map] =
+        ParseSymbolTable(&lldb_symtab, symbol_id, symtab);
+    m_address_class_map.merge(address_class_map);
+    symbol_id += num_symbols;
+  }
 
   // The symtab section is non-allocable and can be stripped, while the
   // .dynsym section which should always be always be there. To support the
@@ -2986,8 +2994,12 @@ void ObjectFileELF::ParseSymtab(Symtab &lldb_symtab) {
     Section *dynsym =
         section_list->FindSectionByType(eSectionTypeELFDynamicSymbols, true)
             .get();
-    if (dynsym)
-      symbol_id += ParseSymbolTable(&lldb_symtab, symbol_id, dynsym);
+    if (dynsym) {
+      auto [num_symbols, address_class_map] =
+          ParseSymbolTable(&lldb_symtab, symbol_id, dynsym);
+      symbol_id += num_symbols;
+      m_address_class_map.merge(address_class_map);
+    }
   }
 
   // DT_JMPREL
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
index bc8e34981a9d8..716bbe01638f3 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
@@ -285,18 +285,19 @@ class ObjectFileELF : public lldb_private::ObjectFile {
 
   /// Populates the symbol table with all non-dynamic linker symbols.  This
   /// method will parse the symbols only once.  Returns the number of symbols
-  /// parsed.
-  unsigned ParseSymbolTable(lldb_private::Symtab *symbol_table,
-                            lldb::user_id_t start_id,
-                            lldb_private::Section *symtab);
+  /// parsed and a map of address types (used by targets like Arm that have
+  /// an alternative ISA mode like Thumb).
+  std::pair<unsigned, FileAddressToAddressClassMap>
+  ParseSymbolTable(lldb_private::Symtab *symbol_table, lldb::user_id_t start_id,
+                   lldb_private::Section *symtab);
 
   /// Helper routine for ParseSymbolTable().
-  unsigned ParseSymbols(lldb_private::Symtab *symbol_table,
-                        lldb::user_id_t start_id,
-                        lldb_private::SectionList *section_list,
-                        const size_t num_symbols,
-                        const lldb_private::DataExtractor &symtab_data,
-                        const lldb_private::DataExtractor &strtab_data);
+  std::pair<unsigned, FileAddressToAddressClassMap>
+  ParseSymbols(lldb_private::Symtab *symbol_table, lldb::user_id_t start_id,
+               lldb_private::SectionList *section_list,
+               const size_t num_symbols,
+               const lldb_private::DataExtractor &symtab_data,
+               const lldb_private::DataExtractor &strtab_data);
 
   /// Scans the relocation entries and adds a set of artificial symbols to the
   /// given symbol table for each PLT slot.  Returns the number of symbols

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, but this part here got me worried:

then asked for a better object file, which returned the same program file again

Does this mean that we had two ObjectFileELF instances referring to the same file on disk floating around? That does not seem like a good idea, as the most common case is having the same file containing both the debug info and code and everything. Ideally, we would quickly determine that we have not been able to find a better match and avoid opening (parsing, mmaping, etc.) the same file twice.

@DavidSpickett
Copy link
Collaborator Author

We can compare against the path the plugin found, though it's surprising that the plugins return anything at all so maybe there is a bug there too.

It'll need a new test case, as TestStringLiteralExpr.test will no longer hit the code I'm changing in this PR, so I'll do it as a follow up.

@DavidSpickett DavidSpickett changed the title [lldb][ELF] Return address class map from symbol table parsing methods [lldb][ELF] Return address class map changes from symbol table parsing methods May 9, 2024
@DavidSpickett
Copy link
Collaborator Author

I realised that as it was, the calling object file would get its map updated but not the other object file. Which is potentially a regression if we choose to use one or the other file for a lookup. So now the methods update the member map, and return the changes they made to it, so the caller can do the same.

But now I've done that I'm wondering why this map doesn't live in the symbol table object itself, so I'm gonna try doing that.

@DavidSpickett
Copy link
Collaborator Author

Alternative: #91603

@labath
Copy link
Collaborator

labath commented May 10, 2024

I realised that as it was, the calling object file would get its map updated but not the other object file. Which is potentially a regression if we choose to use one or the other file for a lookup. So now the methods update the member map, and return the changes they made to it, so the caller can do the same.

But now I've done that I'm wondering why this map doesn't live in the symbol table object itself, so I'm gonna try doing that.

While I do see some advantages to the other approach, I'm not convinced it is better overall. My reasons for that are:

  • The Symtab class is generic, while the approach of marking address types via fake symtab symbols is (I think) an elf invention. In fact, ObjectFileELF goes out of its way to make sure these fake symbols don't end up in the Symtab object for everyone to see, which makes it a bit strange to have that patch add them in via a back door.
  • The address class queries are currently done on the object file, and the address class computation is done on the object file as well. The symtab class has nothing to do with them, and just serves as an outsourced map storage for the object file.

I don't think either of these are showstoppers, and I can live with the other approach if you feel strongly about it, but if it were up to me, I'd stick with this PR (you can put the map into both object files if you wish).

@DavidSpickett
Copy link
Collaborator Author

The Symtab class is generic, while the approach of marking address types via fake symtab symbols is (I think) an elf invention.

I didn't see any use of AddressClass::eCodeAlternateISA outside of ObjectFileELF so I think you're right.

I'll go with this PR then.

@DavidSpickett DavidSpickett merged commit a76518c into llvm:main May 10, 2024
4 checks passed
@DavidSpickett DavidSpickett deleted the address-map branch May 10, 2024 08:21
@DavidSpickett
Copy link
Collaborator Author

There are a couple in MachO actually but you're right, different mechanism being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants