Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DWARF support to display currently executed C code (ELF file) #123

Open
jdupak opened this issue Apr 18, 2024 · 8 comments
Open

Add DWARF support to display currently executed C code (ELF file) #123

jdupak opened this issue Apr 18, 2024 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed thesis-topic Possible topic for thesis

Comments

@jdupak
Copy link
Collaborator

jdupak commented Apr 18, 2024

No description provided.

@jdupak jdupak added enhancement New feature or request help wanted Extra attention is needed thesis-topic Possible topic for thesis labels Apr 18, 2024
@trdthg trdthg mentioned this issue May 7, 2024
13 tasks
@trdthg
Copy link
Contributor

trdthg commented Jun 17, 2024

I think you are referring to

  1. Loading ELF and C source code from the user
  2. Displaying the C source code in an editor tab
  3. find the mapping between instructions and C source locations from the DWARF information in the ELF file
  4. display it in an proper way, such as

Are there any problems?

@jdupak
Copy link
Collaborator Author

jdupak commented Jun 17, 2024

Yes, there will be no C source code. You are loading just ELF and you need to extract the source code information from the debug info in the binary itself. So you will need to find some library that is not too big (LLVM) and is cross platform (including wasm) to read it.

@trdthg
Copy link
Contributor

trdthg commented Jun 17, 2024

thanks,then it will be very hard : I

@trdthg
Copy link
Contributor

trdthg commented Jun 27, 2024

I'm going to try to do something with this issue, to achieve the same functionality as described above

I'll leave the extracting of the source code as an interface (with options in the menu), and give the following two implementations

  • extract from ELF
  • directly load locally

may be looks like this:

image

Since the latter is easier, I'll try to implement it first.

If I still have time I might work on it, but of course it can be left to others!


For extract, I tried to find and test some disassemblers (tested on x86), e.g. ida, ghidra. ida disassembles quite well, but it's not open source. ghidra is open source, and it works fine, but it's also quite a large project, not easy to use, and doesn't really have good support for dwarf-5 and riscv? (not going in depth here, just sharing some progress and thoughts)

@jdupak
Copy link
Collaborator Author

jdupak commented Jun 27, 2024 via email

@trdthg
Copy link
Contributor

trdthg commented Jun 28, 2024

I am not saying I will give up reading ELF, it is necessary and included in my plan. I will certainly search for a relevant library to read elf

  • the third step is to parse the mapping relationship, which of course requires reading ELF (DWARF)

The solution I said only temporarily simplifies the first step: how to get C code

  • get by loading the source code locally
    • at least it's necessary to parse source_code_filepath from ELF
  • get by parsing ELF, ...

@ppisa
Copy link
Member

ppisa commented Sep 30, 2024

I have discussed the goal to use DWARF to map instruction address to source file line with Jan Hubicka at GNU Tools Cauldron and he suggest to look at https://www.nongnu.org/libunwind/

@trdthg
Copy link
Contributor

trdthg commented Oct 12, 2024

I did some simple test

  • libunwind is obviously not enough to extract sufficient C code; it can only extract part of the functions, regs
  • because we need to handle ELF files, we need to use libunwind-ptrace (this is not a problem)
  • I previously tried using eliben/pyelftools(a python library which has some simpler and easier apis) to parse DWARF info, it can provide precise details such as variable names, types, and corresponding line numbers.

I discussed it with my friend and they thinks that "It is impossible to not look back at the C code through decompilation in the case of only ELF"

But there may be a way to build map between variable info(name,type,line_number from dwarf) and it's real value with libunwind, ptrace and dwarf

There is a blog that describes some similar ideas, I haven't put it into practice yet

Some reference materials

And this issue generally looks like it needs to implement a decompiler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed thesis-topic Possible topic for thesis
Projects
None yet
Development

No branches or pull requests

3 participants