-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lldb test suite fails to JIT expressions after update to Ubuntu Jammy ubuntu3.3 #68987
Comments
Immediate action from me will be to make the buildbot silent. |
@llvm/issue-subscribers-lldb Author: David Spickett (DavidSpickett)
TLDR: Long report for background, I'll add the important follow up questions in a comment after this.
Since I updated our 32 bit Arm lldb bot container from:
To:
The Arm lldb bot has been failing with:
Tests fail with various things along the theme of:
First happened here, the changes are unrelated: For reasons I still cannot explain, the fixing of this Ubuntu issue (realted to gdb) on Jammy, caused this failure: The funny thing is, back when this was fixed on Bionic we also saw this and figured it was a bug there and moved the bot to Jammy. Now the GDB issue is fixed on Jammy too but and lldb has broken again. So clearly we are in the wrong here. The background of that bug is not so important once I explain what lldb is doing. Though it is along the same lines, Reproducer:
Here's what happens.
And here is where the fun starts. On 32 bit Arm we have ARM and Thumb code modes, Thumb is the compressed set. If we look at the mmap symbol
See that start address? 2 byte alignment which means a Thumb mode function. At least, it not being 4 byte aligned means it's not ARM. I'm not 100% sure what Thumb requires. Anyway, point is that the usual trick of "bit 0 or 1 is set means Thumb" doesn't work here. Since the symbol doesn't Great, do we do that? We try to, but it doesn't work. There are no loaded sections for ld-linux for us to find the mmap in.
In the output above, anything with a name after the file name is a section we know about. This is why we are able to resolve the fake return address we give for mmap. Which is the symbol mmap we have no idea so the address comes back unchanged and PrepareTrivialCall decides well it must So, let's just load the sections for ld-linux, right? Well, it turns out we actually do, then we throw them away. This patch fixes the whole issue (for Arm at least):
Now you can run expressions succesfully. If we look at the memory regions again:
Now we have a region at 0x00000000f7fef000 that has a section associated with it. mmap is at 0xf7fd586c. Before patch:
After patch:
Having a section allows us to resolve the address and set the correct mode. Its AddressClass goes from eUnknown to eCodeAlternateISA (alternate being Thumb here). PrepareTrivialCall sets CPSR.T correctly and it all works. I have no idea why the test suite ever passed on previous versions of Jammy, and am unable to test it because lldb appears to try to treat the older Jammy's ld-linux as ARM code, meaning I can't start a program because breaking inside of it doesn't work (again, no idea how the test suite managed to run). This unloading of ld-linux happens on AArch64 also but there we have no reason to need the details of the mmap symbol. It's enough to know its address. This unloading was added by 5535582 "The change in RefreshModules ensures we don't broadcast the loaded The reason lldb believes it has already loaded the ld-linux (or at least, told the user it has) is that m_interpreter_address in Dyld is set from AUXV_AT_BASE in DynamicLoaderPOSIXDYLD::EvalSpecialModulesStatus. This happens before any shared objects have been looked at. I also confirmed that there is only one point at which ld-linux is added. So at least for this distro, there are not multiple copies of it that we have to ignore. |
@labath Do you remember what the intent was with 5535582 ? I do not know what these broadcasts are, and what harm broadcasting one about ld-linux twice would be. Is there a way I can write a test case to observe them? @clayborg You were in this area recently, any ideas here? (and I did confirm that your recent change 07c215e is not to blame here) |
Turns out we load the interpreter first in |
Fixes llvm#68987 Early on we load the interpreter (most commonly ld-linux) in LoadInterpreterModule. Then later when we get the first DYLD rendezvous we get a list of libraries that commonly includes ld-linux again. Previously we would load this duplicate, see that it was a duplicate, and unload it. Problem was that this unloaded the section information of the first copy of ld-linux. On platforms where you can place a breakpoint using only an address, this wasn't an issue. On ARM you have ARM and Thumb modes. We must know which one the section we're breaking in is, otherwise we'll go there in the wrong mode and SIGILL. This happened on ARM when lldb tried to call mmap during expression evaluation. To fix this, I am making the assumption that the base address we see in the module prior to loading can be compared with what we know the interpreter base address is. Then we don't have to load the module to know we can ignore it. This fixes the lldb test suite on Ubuntu versions where https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1927192 has been fixed. Which was recently done on Jammy.
…ing them (#69932) Fixes #68987 Early on we load the interpreter (most commonly ld-linux) in LoadInterpreterModule. Then later when we get the first DYLD rendezvous we get a list of libraries that commonly includes ld-linux again. Previously we would load this duplicate, see that it was a duplicate, and unload it. Problem was that this unloaded the section information of the first copy of ld-linux. On platforms where you can place a breakpoint using only an address, this wasn't an issue. On ARM you have ARM and Thumb modes. We must know which one the section we're breaking in is, otherwise we'll go there in the wrong mode and SIGILL. This happened on ARM when lldb tried to call mmap during expression evaluation. To fix this, I am making the assumption that the base address we see in the module prior to loading can be compared with what we know the interpreter base address is. Then we don't have to load the module to know we can ignore it. This fixes the lldb test suite on Ubuntu versions where https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1927192 has been fixed. Which was recently done on Jammy.
TLDR: LLDB for some reason unloads important information about ld-linux, which prevents us from correctly calling mmap on Arm/Thumb.
Long report for background, I'll add the important follow up questions in a comment after this.
Since I updated our 32 bit Arm lldb bot container from:
To:
The Arm lldb bot has been failing with:
Tests fail with various things along the theme of:
First happened here, the changes are unrelated:
https://lab.llvm.org/buildbot/#/builders/17/builds/44298
For reasons I still cannot explain, the fixing of this Ubuntu issue (realted to gdb) on Jammy, caused this failure:
https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1927192
The funny thing is, back when this was fixed on Bionic we also saw this and figured it was a bug there and moved the bot to Jammy. Now the GDB issue is fixed on Jammy too but and lldb has broken again. So clearly we are in the wrong here.
The background of that bug is not so important once I explain what lldb is doing. Though it is along the same lines,
it was preventing gdb from putting breakpoints in ld-linux.
Reproducer:
Here's what happens.
mmap
.setup the arguments according to the ABI and possibly change the CPSR (program status register) for ARM and Thumb modes.
And here is where the fun starts.
On 32 bit Arm we have ARM and Thumb code modes, Thumb is the compressed set. If we look at the mmap symbol
that lldb chooses, it's in ./arm-linux-gnueabihf/ld-linux-armhf.so.3.
See that start address? 2 byte alignment which means a Thumb mode function. At least, it not being 4 byte aligned means it's not ARM. I'm not 100% sure what Thumb requires.
Anyway, point is that the usual trick of "bit 0 or 1 is set means Thumb" doesn't work here. Since the symbol doesn't
have the bottom bits set. This should be fine because we can look up the function's type from the symbol, or the type of the section (it is not important which here).
Great, do we do that? We try to, but it doesn't work. There are no loaded sections for ld-linux for us to find the mmap in.
In the output above, anything with a name after the file name is a section we know about.
So we have some for test.o and libc.so.6 but nothing for ld-linux.
This is why we are able to resolve the fake return address we give for mmap. Which is the symbol
_start
from the libc. This we do have a section for therefore when we resolve it, the thumb bit is placed correctly.mmap we have no idea so the address comes back unchanged and PrepareTrivialCall decides well it must
be ARM then. Doesn't set the T bit in CPSR, jumps to mmap and immediately SIGILLs because we're trying run Thumb in Arm mode.
So, let's just load the sections for ld-linux, right?
Well, it turns out we actually do, then we throw them away. This patch fixes the whole issue (for Arm at least):
Now you can run expressions succesfully. If we look at the memory regions again:
Now we have a region at 0x00000000f7fef000 that has a section associated with it. mmap is at 0xf7fd586c.
Before patch:
After patch:
Having a section allows us to resolve the address and set the correct mode. Its AddressClass goes from eUnknown to eCodeAlternateISA (alternate being Thumb here). PrepareTrivialCall sets CPSR.T correctly and it all works.
I have no idea why the test suite ever passed on previous versions of Jammy, and am unable to test it because lldb appears to try to treat the older Jammy's ld-linux as ARM code, meaning I can't start a program because breaking inside of it doesn't work (again, no idea how the test suite managed to run).
This unloading of ld-linux happens on AArch64 also but there we have no reason to need the details of the mmap symbol. It's enough to know its address.
This unloading was added by 5535582
"The change in RefreshModules ensures we don't broadcast the loaded
notification for the dynamic loader (ld.so) module more than once."
The reason lldb believes it has already loaded the ld-linux (or at least, told the user it has) is that m_interpreter_address in Dyld is set from AUXV_AT_BASE in DynamicLoaderPOSIXDYLD::EvalSpecialModulesStatus. This happens before any shared objects have been looked at.
I also confirmed that there is only one point at which ld-linux is added. So at least for this distro, there are not multiple copies of it that we have to ignore.
The text was updated successfully, but these errors were encountered: