Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inference and correspondence between representations #39

Closed
surovic opened this issue Oct 6, 2019 · 2 comments
Closed

Type inference and correspondence between representations #39

surovic opened this issue Oct 6, 2019 · 2 comments
Assignees
Labels
decomp Related to LLVM IR to C decompiler enhancement New feature or request

Comments

@surovic
Copy link
Contributor

surovic commented Oct 6, 2019

While doing type translation and type casting C expressions I ran into a lot of trouble with different semantics of operations between Z3, LLVM IR and C. For example, C allows numeric costants to have types int, long and long long in their signed and unsigned versions. LLVM IR routinely contains numeric constants of i1 and i8 types, which would naturally map to C types like char. Another example would be the conflation of C pointers and integers into bitvector sorts when Z3 is involved. It's impossible to tell if a 64 bits wide Z3_BV_SORT is a char* or long long.

The issue becomes even more complex when typing of expressions is involved. LLVM IR has every instruction (a value) explicitly typed and this type can differ from the what the result of an equivalent C expression would be.

My proposal would be to only directly translate variable and constant types between representations (Z3, IR, C). Expression types would inferred using the type semantics of the given representation without referring to the expression types of any other representations. However the result types of expressions should correspond between the representations.

For example if an IR and i8 %a, 1, where %a is an i32, yields an i8. The equivalent C expression must yield an i8 equivalent type, namely unsigned char. So the equivalent C expression would be (unsigned char)(a & 1U).

The correspondence check can be implemented using gtest / gflag CHECK() macros.

@surovic surovic self-assigned this Oct 6, 2019
@surovic surovic changed the title Type inference and correspondence Type inference and correspondence between representations Oct 6, 2019
@surovic surovic added decomp Related to LLVM IR to C decompiler enhancement New feature or request labels Nov 7, 2019
@pgoodman
Copy link
Collaborator

I think that you should pre-define a bunch of "built-in" types / typedefs, e.g. int8_t, uint8_t, etc. When deciding on the signedness of a variable, I'd inspect the graph of all like-typed llvm uses, uses-of-uses, etc. and try to find the most "popular" interpretation, based on observed operations, then use that as the high-level type, and where an oppositely-signed operation is done, I'd inject casts and stuff. Not sure if this helps :-P

Also, I'd be wary of assuming pointers are always 64 bits in with, be sure to check the datalayout.

@surovic
Copy link
Contributor Author

surovic commented May 19, 2021

Resolved by #121

@surovic surovic closed this as completed May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decomp Related to LLVM IR to C decompiler enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants