Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing loaded elf? #557

Open
RisinT96 opened this issue Jul 3, 2023 · 18 comments
Open

Support parsing loaded elf? #557

RisinT96 opened this issue Jul 3, 2023 · 18 comments

Comments

@RisinT96
Copy link

RisinT96 commented Jul 3, 2023

Hello,

I have a use case where I want to parse an elf that's already loaded into my processes memory, as the elf is loaded it's missing sections - which the current elf parser is heavily reliant upon.

Some use could still be had from parsing the dynamic segment (PT_DYNAMIC) to find dynamic symbols, relocations, etc...

I have something similar implemented, using object's type system, but I would like to add that as a built-in feature of this project.

Is this something that could interest you?

Thank you!

@philipc
Copy link
Contributor

philipc commented Jul 3, 2023

I'm interested in it. This sounds similar to #548. I had thought that this sort of thing wasn't possible for ELF, so it'd be good to see what you have done.

@RisinT96
Copy link
Author

RisinT96 commented Jul 4, 2023

Cool.

Since loaded elfs don't have sections, some API changes might be necessary .

Mostly returning Option in some cases.

Could also work without the changes, by returning Err/empty iterators where applicable.

Additionally, I believe we should somehow differentiate between parsing a regular and loaded elf.
My current way of detecting that an elf is "loaded" is when let sections = header.sections(endian, data) fails.
This might be an error that we want to catch if it happens when parsing a regular elf file.
What about adding a parse_loaded API that won't look for sections/etc, and keep the original parse as it is.

What are your thoughts?

@philipc
Copy link
Contributor

philipc commented Jul 4, 2023

Returning empty iterators seems ideal. It's what we already do for file formats that don't support some features. Errors should only be if something is actually wrong, not when it's simply not supported.

Adding parse_loaded sounds good. Don't use a parsing failure. #538 currently adds a parameter to parse, but I prefer a new method.

@RisinT96
Copy link
Author

RisinT96 commented Jul 4, 2023

So section_by_index and symbol_by_index should probably be changed to return Result<Option<_>>.

@philipc
Copy link
Contributor

philipc commented Jul 4, 2023

Hmm, not sure I like changing those APIs either. How about try to do it without API changes and we can see what it looks like.

@RisinT96
Copy link
Author

RisinT96 commented Jul 4, 2023

Sure

I'll have some time to work on this on the weekend.

@RisinT96
Copy link
Author

RisinT96 commented Jul 5, 2023

I have an update,

When parsing the dynamic table, reading an entry that contains an address (for example DT_JMPREL which should point to .rela.plt), I'm getting different results on different machines:

  • On Android devices (both 32 and 64 bit) I'm getting an offset into the data reader/slice.
  • On Linux machines (checked on ubuntu 20.04 amd64 and wsl2 ubuntu 22.04 amd64) I'm getting the actual address of the section in the processes memory.

I believe it has something to do with android having a custom dynamic linker than regular Linux.

I'm wondering what would be the correct way to handle this.

Perhaps, since this is only useful when a process is analyzing itself/other processes on the same system, it would make sense to use some kind of cfg depending on the compiled target, and add a workaround for all Android targets.

Or some kind of heuristic on the returned value, i.e. if it exceeds the size of data or something.

I'm not sure if there's a sure way to know :/

@philipc
Copy link
Contributor

philipc commented Jul 5, 2023

This may be getting into the kind of reason why I thought this wasn't possible for ELF. If you're relying on undocumented internals of dynamic linkers, then that's an area that I'm not sure I want to support in the high level API of this crate. It may be better to adapt the low level API to suit your needs instead.

Can you provide code to show how you are using this?

@RisinT96
Copy link
Author

RisinT96 commented Jul 5, 2023

You can find a very rough draft here

It successfully finds all the relevant dynamic sections.

It's useful for me when trying to parse the dynamic relocations and PLT of another .so loaded into my process, I can find its base address using dladdr, and by parsing its ELFs header (go over all program headers, find the segment with largest vadrr + memsz) I can find its size.

This allows me to get a slice that encapsulates the entire elf.

@philipc
Copy link
Contributor

philipc commented Jul 6, 2023

You can find a very rough draft here

Thanks.

I can find its base address using dladdr, and by parsing its ELFs header (go over all program headers, find the segment with largest vadrr + memsz) I can find its size.

Are you able to share the code for that too, and ideally an example of the sort of thing you want to do with the resulting parsed file. We'll need to have a test of some kind in this crate.

Getting back to your question in #557 (comment), my feeling is that the ideal solution would result in the compiled executable including code that only supports the compiled target, and thus can work reliably instead of using heuristics.

If I were to write this code myself, instead of using the higher level API (things like File, ElfFile, and ElfSymbolTable), I would use the lower level API (things like elf::ProgramHeader and read::elf::SymbolTable), and use cfgs in my own code to control behaviour that depends on the compiled target. That is, I would write code that looks similar to the ElfFile::parse_loaded that you have written, but that code would live in my own crate. This may involve improving the lower level API in object to make it easier to do that.

@RisinT96
Copy link
Author

RisinT96 commented Jul 6, 2023

Are you able to share the code for that too, and ideally an example of the sort of thing you want to do with the resulting parsed file. We'll need to have a test of some kind in this crate.

I'll see what I can share

Getting back to your question in #557 (comment), my feeling is that the ideal solution would result in the compiled executable including code that only supports the compiled target, and thus can work reliably instead of using heuristics.

Yeah that's what I thought as well, use cfgs that use offsets when compiling an Android target, and cfgs that use absolute addresses on Linux targets, and disable this entirely on untested targets

If I were to write this code myself, instead of using the higher level API (things like File, ElfFile, and ElfSymbolTable), I would use the lower level API (things like elf::ProgramHeader and read::elf::SymbolTable), and use cfgs in my own code to control behaviour that depends on the compiled target. That is, I would write code that looks similar to the ElfFile::parse_loaded that you have written, but that code would live in my own crate. This may involve improving the lower level API in object to make it easier to do that.

That's basically what I already have implemented, unfortunately I can't share that code as is.

I was hoping to implement some of that functionality directly into this library for easier access for everyone.

@philipc
Copy link
Contributor

philipc commented Jul 7, 2023

I think adding code in the lower level API that is similar to ElfFile::parse_loaded could be useful for others, but I'm not convinced that ElfFile itself should support this.

For your use case, are you doing things that are specific to ELF? If we added ElfFile::parse_loaded and PeFile::parse_loaded, would your application be able to work on Windows by changing the ElfFile::parse_loaded call to a PeFile::parse_loaded call? If your code can't be written to work on both, then there is no reason to use an abstraction layer.

@RisinT96
Copy link
Author

RisinT96 commented Jul 7, 2023

I'm not familiar enough with PE, but from a short search it seems like PEs IAT is very similar to ELFs PLT, and can be modified to the same effect.

I imagine the dynamic relocations abstraction gives access to the same information.

So I think there's value to be had in the abstraction.

Although improving the low level access code will make the ElfFile code cleaner.

@bjorn3
Copy link
Contributor

bjorn3 commented Jul 7, 2023

dl_iterate_phdr returns a program header for every loaded DSO. According to the man page you have to calculate absolute addresses of segments using info->dlpi_addr + info->dlpi_phdr[x].p_vaddr. Does this work on both glibc and bionic (android) and does it return absolute addresses in both cases?

@RisinT96
Copy link
Author

RisinT96 commented Jul 7, 2023

dl_iterate_phdr returns a program header for every loaded DSO. According to the man page you have to calculate absolute addresses of segments using info->dlpi_addr + info->dlpi_phdr[x].p_vaddr. Does this work on both glibc and bionic (android) and does it return absolute addresses in both cases?

I'm getting absolute addresses in both cases:

Linux

Loaded so at address (0x00007ffff7fc1000): [linux-vdso.so.1]
	Found segment at (0x00007ffff7fc1000) size: 0xd55
	Found segment at (0x00007ffff7fc13a0) size: 0x110
	Found segment at (0x00007ffff7fc14b0) size: 0x54
	Found segment at (0x00007ffff7fc1504) size: 0x34

Android:

Loaded so at address (0x000000792201c000): [/system/bin/linker64]
        Found segment at (0x000000792201c040) size: 0x230
        Found segment at (0x000000792201c000) size: 0x37e5c
        Found segment at (0x0000007922054000) size: 0xe7970
        Found segment at (0x000000792213c000) size: 0x7838
        Found segment at (0x0000007922144840) size: 0xd475
        Found segment at (0x0000007922142cc8) size: 0x120
        Found segment at (0x000000792213c000) size: 0x8000
        Found segment at (0x00000079220340d8) size: 0x610c
        Found segment at (0x000000792201c000) size: 0x0
        Found segment at (0x000000792201c270) size: 0x20

also 32bit android:

Loaded so at address (0xf33ba000): [/system/bin/linker]
        Found segment at (0xf33ba034) size: 0x160
        Found segment at (0xf33ba000) size: 0x1c270
        Found segment at (0xf33d7280) size: 0xaa650
        Found segment at (0xf34828d0) size: 0x3e50
        Found segment at (0xf3487720) size: 0xaf50
        Found segment at (0xf3486144) size: 0x90
        Found segment at (0xf34828d0) size: 0x4730
        Found segment at (0xf33d404c) size: 0x834
        Found segment at (0xf33ba000) size: 0x0
        Found segment at (0xf33ba194) size: 0x20
        Found segment at (0xf33ba90c) size: 0x4fb8

@RisinT96
Copy link
Author

RisinT96 commented Jul 10, 2023

So I decided to start slow, added some lower level logic for parsing the dynamic segment.

This should work both if the segment was found using dl_iterate_phdr, or by manually parsing a loaded elf.

This also works with regular elf files (not loaded).

let me know what you guys think.

@bjorn3
Copy link
Contributor

bjorn3 commented Jul 10, 2023

Parsing the dynamic segment is as easy as calling ReadRef::read_slice::<Dyn64>() on a slice with the segment data (or Dyn32 for an ELF32 file). This gives you a &[Dyn64] with the individual entries. See the dynamic method on several of the types in the elf read module. You can then use the Dyn trait for convenience functions on the dynamic segment entries.

Edit: Just saw your PR. Looks like you were thinking about a high level parsing interface, while I was thinking about a low level one. A high level one is of course nicer for most use cases.

@RisinT96
Copy link
Author

Edit: Just saw your PR. Looks like you were thinking about a high level parsing interface, while I was thinking about a low level one. A high level one is of course nicer for most use cases.

Yeah, by low level I meant it was lower level than parsing the entire elf from scratch.

It's probably closer to high level than low level though 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants