Type inference and correspondence between representations #39

surovic · 2019-10-06T14:25:21Z

While doing type translation and type casting C expressions I ran into a lot of trouble with different semantics of operations between Z3, LLVM IR and C. For example, C allows numeric costants to have types int, long and long long in their signed and unsigned versions. LLVM IR routinely contains numeric constants of i1 and i8 types, which would naturally map to C types like char. Another example would be the conflation of C pointers and integers into bitvector sorts when Z3 is involved. It's impossible to tell if a 64 bits wide Z3_BV_SORT is a char* or long long.

The issue becomes even more complex when typing of expressions is involved. LLVM IR has every instruction (a value) explicitly typed and this type can differ from the what the result of an equivalent C expression would be.

My proposal would be to only directly translate variable and constant types between representations (Z3, IR, C). Expression types would inferred using the type semantics of the given representation without referring to the expression types of any other representations. However the result types of expressions should correspond between the representations.

For example if an IR and i8 %a, 1, where %a is an i32, yields an i8. The equivalent C expression must yield an i8 equivalent type, namely unsigned char. So the equivalent C expression would be (unsigned char)(a & 1U).

The correspondence check can be implemented using gtest / gflag CHECK() macros.

The text was updated successfully, but these errors were encountered:

pgoodman · 2019-11-22T23:04:29Z

I think that you should pre-define a bunch of "built-in" types / typedefs, e.g. int8_t, uint8_t, etc. When deciding on the signedness of a variable, I'd inspect the graph of all like-typed llvm uses, uses-of-uses, etc. and try to find the most "popular" interpretation, based on observed operations, then use that as the high-level type, and where an oppositely-signed operation is done, I'd inject casts and stuff. Not sure if this helps :-P

Also, I'd be wary of assuming pointers are always 64 bits in with, be sure to check the datalayout.

surovic · 2021-05-19T10:57:04Z

Resolved by #121

surovic self-assigned this Oct 6, 2019

surovic changed the title ~~Type inference and correspondence~~ Type inference and correspondence between representations Oct 6, 2019

surovic added decomp Related to LLVM IR to C decompiler enhancement New feature or request labels Nov 7, 2019

surovic mentioned this issue Mar 17, 2021

Refactor Clang AST building and fix typing in all representations #121

Merged

surovic closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type inference and correspondence between representations #39

Type inference and correspondence between representations #39

surovic commented Oct 6, 2019 •

edited

Loading

pgoodman commented Nov 22, 2019

surovic commented May 19, 2021

Type inference and correspondence between representations #39

Type inference and correspondence between representations #39

Comments

surovic commented Oct 6, 2019 • edited Loading

pgoodman commented Nov 22, 2019

surovic commented May 19, 2021

surovic commented Oct 6, 2019 •

edited

Loading