Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Importing File When Using PDB/XDB File #27

Open
Zatarita opened this issue Jul 8, 2024 · 10 comments
Open

Error Importing File When Using PDB/XDB File #27

Zatarita opened this issue Jul 8, 2024 · 10 comments

Comments

@Zatarita
Copy link

Zatarita commented Jul 8, 2024

I am trying to use the plugin to load a game that has a PDB & XDB file bundled with it; however, when I try to load the file with the PDB it complains about the file not being able to be opened. The PDB works in IDA for my friend. but it won't work in ghidra. Though, when try to decompile the xex it works no problem, it's only when I use the PDB that it complains that the file cannot be opened.

This is the trace:
image

I followed the directions, disabled pdata, set load PDB and set use experimental. When using the PDB it complains that it doesn't match the xex, but when I use the XDB it says that it matches correctly. It crashes at the VERY end of parsing the types.

@zeroKilo
Copy link
Owner

zeroKilo commented Jul 9, 2024

well without your files I can not debug that and all I can say it works with my test files. the only thing that changed recently is ghidra's pdb reader interface, maybe that causes problems

@Zatarita
Copy link
Author

Zatarita commented Jul 9, 2024

I uploaded it to google drive since it was too big for github: https://drive.google.com/file/d/1Kz4hVMz67_0gFM5zERj1VEYNYaIVGabJ/view?usp=sharing

Lemme know when you have them, I want delete them from google drive. Google is a pain about stuff like this

Also, Using ghidra 11.1.1

I did decompile the pdb with the dia tool in visual studio and, adjusting for the starting offsets, the functions seem to match the function calls in the pdb. So I'm not quite sure what has angered the importer but sadly I dunno how ghidra pdb works. Apparently there are structure definitions in there too which is new for me. I'm only used to function signatures

@zeroKilo
Copy link
Owner

zeroKilo commented Jul 9, 2024

well I cant directly download it, but it showed a button to request access from you, pls check your emails

PS: please keep in mind im very busy usually so fixing this can take a while, I hope you are patient

@Zatarita
Copy link
Author

Zatarita commented Jul 9, 2024

well I cant directly download it, but it showed a button to request access from you, pls check your emails

PS: please keep in mind im very busy usually so fixing this can take a while, I hope you are patient

Strange, it should of worked for anyone with link. I updated it again. And you're fine, I appreciate what you have done already. Ultimately I don't need anything immediately, this was more of a curiosity anyway, but it may help others in the future. I completely understand and have no expectations for you to rush it or anything

@zeroKilo
Copy link
Owner

alright, it worked this time and I got the files. I will let you know once I know more

@zeroKilo
Copy link
Owner

btw, I havent forgotten about this, a first quick debug session reveals its a problem in loading/applying types, not loading the pdb itself. The default pdb loader ignores alot stuff, so Ive been testing with the experimental one (the one I wrote). Also check out the latest release with the new ghidra, it now has a pdb loading dialog, in there it will complain that the pdb doesnt match but you can still load it. I will let you know once I know more.
Greetz

@zeroKilo
Copy link
Owner

image
so I used these settings after checking "load pdb" and "use experimental pdb loader" in the settings at the start

image
while debugging I printed out that these types are invalid/make errors

TPIStream.txt
so if you replace TPIStream.java with the content of this file (wasnt allowed to attach a java file) and make a new build for your current ghidra you should be able to load this pdb fine.

Im sorry but im out of time, so I will have to make a new release next weekend...

Greetz WV

@Zatarita
Copy link
Author

Zatarita commented Jul 28, 2024

It works! Thank you.
I was manually entering from the decompiled PBD, but this works really well too.

I don't want to seem ungrateful, I understand you're busy; however, I do notice that it appears to struggle with anonymous structs and unions. Here is an example from the decomp.

Type: guard_state_data

UserDefinedType: guard_state_data
Data           :   this+0x0, Member, Type: short, wait_ticks
Data           :   this+0x2, Member, Type: short, look_ticks
Data           :   this+0x4, Member, Type: unsigned char, path_begun
Data           :   this+0x5, Member, Type: unsigned char, post_combat
Data           :   this+0x6, Member, Type: unsigned char, post_combat_vocalized
Data           :   this+0x7, Member, Type: unsigned char, post_combat_shooting
Data           :   this+0x8, Member, Type: unsigned char, cower
Data           :   this+0x9, Member, Type: unsigned char, cower_panicked
Data           :   this+0xA, Member, Type: unsigned char, cower_from_retreat
Data           :   this+0xC, Member, Type: short, cower_ticks
Data           :   this+0xE, Member, Type: unsigned char, find_new_guard_position
Data           :   this+0xF, Member, Type: unsigned char, shout_about_dead_friend
Data           :   this+0x10, Member, Type: long, shout_dead_friend_prop_index
Data           :   this+0x14, Member, Type: unsigned char, has_guard_direction
Data           :   this+0x15, Member, Type: unsigned char, aim_in_guard_direction
Data           :   this+0x18, Member, Type: union real_vector3d, guard_direction
UserDefinedType:     real_vector3d

Data           :   this+0x24, Member, Type: short, guard_location_type
Data           :   this+0x28, Member, Type: short, guard_firing_position_index       <-- These two **
Data           :   this+0x28, Member, Type: struct <unnamed-tag>, guard_point  <-- Fields here **
UserDefinedType:     <unnamed-tag>

Data           :   this+0x3C, Member, Type: long, guard_look_prop_index
Data           :   this+0x40, Member, Type: unsigned char, guard_look_until_reached_point

this game REALLY likes to use anonymous structs, especially when wrapped in anonymous unions. (I'm pretty sure) The field "guard_firing_position_index" is a union with "guard_point" (at offset 0x28) the PDB parser makes these sequential instead. I think it might not parse this as a union because it's an inline anonymous union instead of using a defined union type (kinda like the guard direction above at offset 0x18)

image

This is what we got from IDA (I don't have IDA, but my friend who dissassembled it with IDA sent me a header file with all the structs in it)

/* 41218 */
struct $05D1CFCD53EFAA2D9FB727948AC8653D               <- the anonymous struct named "<unnamed-tag>" as uuid
{
  real_point3d position;
  int surface_index;
  float radius;
};

union $6BDB108FFD1AA31785466BB75499A4FE                 <- The anonymous union at 0x28
{
  __int16 guard_firing_position_index;                                     <- The firing index
  $05D1CFCD53EFAA2D9FB727948AC8653D guard_point;   <- The guard point anonymous struct as uuid
};

/* 41220 */
struct __declspec(align(4)) guard_state_data
{
  __int16 wait_ticks;
  __int16 look_ticks;
  unsigned __int8 path_begun;
  unsigned __int8 post_combat;
  unsigned __int8 post_combat_vocalized;
  unsigned __int8 post_combat_shooting;
  unsigned __int8 cower;
  unsigned __int8 cower_panicked;
  unsigned __int8 cower_from_retreat;
  __int16 cower_ticks;
  unsigned __int8 find_new_guard_position;
  unsigned __int8 shout_about_dead_friend;
  int shout_dead_friend_prop_index;
  unsigned __int8 has_guard_direction;
  unsigned __int8 aim_in_guard_direction;
  real_vector3d guard_direction;
  __int16 guard_location_type;
  $6BDB108FFD1AA31785466BB75499A4FE ___u17;            <- the anonymous union as some type of uuid
  int guard_look_prop_index;
  unsigned __int8 guard_look_until_reached_point;
};

Another issue is since it's an <unnamed-tag> there is a conflict for the definitions since all the unnamed tags are named <unnamed-tag>. I dunno if it would be possible to maybe append an index to the end of the unnamed tags so they all aren't conflicting, or maybe replace the unnamed tag with an uuid?

I can just delete all the unnamed tags and manually enter it in. Having a lot of it defined still does save me a lot of time still from what I was doing; so, I don't want to seem ungrateful. If I knew how java worked I would consider just trying to do it myself.

Also, I dunno if it would be possible, but it seems like the function parameters are available in the PDB as well. For example:

the parser parsed a function as:
void ai_communication_get_player_rating(undefined8 param_1,ulonglong param_2,ulonglong param_3,ulonglong param_4)

But the decompiled PDB shows function parameters (strangely enough local parameters sometimes too, but I don't think that would be doable)

Function       : static, [017CC9E0][0003:0141C9E0], len = 00000318, ai_communication_get_player_rating
                 Function attribute:
                 Function info: asyncheh
FuncDebugStart :   static, [017CC9F4][0003:0141C9F4]
FuncDebugEnd   :   static, [017CCCF8][0003:0141CCF8]
Label          :   static, [017CC9F4][0003:0141C9F4], $M33737
Data           :   enregistered bl, Param, Type: long, unit_index
Data           :   enregistered ah, Param, Type: unsigned char, test_line_of_sight
Data           :   enregistered ch, Param, Type: long *, unit_index_reference
Data           :   enregistered dh, Param, Type: float *, distance_reference
Data           :   cl Relative, [00000090], Local, Type: struct data_iterator, iterator
Data           :   cl Relative, [00000060], Local, Type: union real_point3d, unit_head_position
Data           :   cl Relative, [00000050], Local, Type: union real_point3d, player_head_position
Data           :   cl Relative, [000000A0], Local, Type: struct collision_result, collision
Data           :   cl Relative, [00000080], Local, Type: union real_vector3d, player_aiming_vector

Which when manually entered would supply me with the following function signature:
void ai_communication_get_player_rating(long unit_index, unsigned char test_line_of_sight, long* unit_index_reference, float* distance_reference)

I would 100% be willing to try and contribute to the project if I knew how. I've never built a ghidra plugin, and I don't have much experience with java. I do have a bunch of experience with reverse engineering binary data formats though, and if there is a way I can maybe help locate where the data in in the PDB so it can be parsed; I would be quite happy to try and help with that. I hate just asking you to do something for me, and not offering anything in return :c I am extremely appreciative of what you have done already These are things I can 100% work around on my own with some work, and if this doesn't seem manageable I understand completely. Though, it may be beneficial to other people in the future who might use the plugin who have PDBs with this much info in em.

Again, thank you for what you have done to help me. I really appreciate it c:

@zeroKilo
Copy link
Owner

zeroKilo commented Aug 11, 2024

Hi,
sorry for the late reply and thanks alot for the detailed information, but I really have alot to do recently. So this bug seems to be alot more complicated than expected. First off you can dump the contents of the pdb with cvdump from microsoft themself:

https://github.com/Microsoft/microsoft-pdb/blob/master/cvdump/cvdump.exe

then run cvdump.exe HCEX_Release.pdb > dump.txt

which will create a ca. 500MB text file, which shows in detail how the types are structured. now I have to compare my code that reads that with that output:

https://github.com/zeroKilo/XEXLoaderWV/blob/master/XEXLoaderWV/src/main/java/xexloaderwv/TypeRecord.java#L1156

if you can help me figure out HOW that structure is read wrong, we can maybe solve this together :) but currently I really dont know and dont have enough time to waste a day on comparing my output with that from m$. btw, also disable the "process .pdata section" if you load a pdb with my tool, because that would create these func_xxxxxxxx names, which should exist the pdb with the real name, so we dont need them.

greetz WV

PS: here your function in the output from m$:

(001D84) S_GPROC32: [0003:0141C9E0], Cb: 00000318, Type:         0x0006E327, ai_communication_get_player_rating
         Parent: 00000000, End: 00001F18, Next: 00000000
         Debug start: 00000014, Debug end: 00000318

(001DD0)  S_FRAMEPROC:
          Frame size = 0x000001F0 bytes
          Pad size = 0x00000000 bytes
          Offset of pad in frame = 0x00000000
          Size of callee save registers = 0x00000000
          Address of exception handler = 0000:00000000
          Function info: asynceh invalid_pgo_counts Local=default Param=default (0x00000200)
(001DF0)  S_LABEL32: [0003:0141C9F4], $M33737
(001E04)  S_REGISTER: r3, Type:       T_LONG(0012), unit_index
(001E1C)  S_REGISTER: r4, Type:      T_UCHAR(0020), test_line_of_sight
(001E3C)  S_REGISTER: r5, Type:    T_32PLONG(0412), unit_index_reference
(001E5C)  S_REGISTER: r6, Type:  T_32PREAL32(0440), distance_reference
(001E7C)  S_REGREL32: r1+00000090, Type:         0x000223AF, iterator
(001E94)  S_REGREL32: r1+00000060, Type:         0x000222A8, unit_head_position
(001EB8)  S_REGREL32: r1+00000050, Type:         0x000222A8, player_head_position
(001EDC)  S_REGREL32: r1+000000A0, Type:         0x0006DF09, collision
(001EF4)  S_REGREL32: r1+00000080, Type:         0x000223A1, player_aiming_vector

(001F18) S_END

@Zatarita
Copy link
Author

Hi,

sorry for the late reply and thanks alot for the detailed information, but I really have alot to do recently. So this bug seems to be alot more complicated than expected. First off you can dump the contents of the pdb with cvdump from microsoft themself:

https://github.com/Microsoft/microsoft-pdb/blob/master/cvdump/cvdump.exe

then run cvdump.exe HCEX_Release.pdb > dump.txt

which will create a ca. 500MB text file, which shows in detail how the types are structured. now I have to compare my code that reads that with that output:

https://github.com/zeroKilo/XEXLoaderWV/blob/master/XEXLoaderWV/src/main/java/xexloaderwv/TypeRecord.java#L1156

if you can help me figure out HOW that structure is read wrong, we can maybe solve this together :) but currently I really dont know and dont have enough time to waste a day on comparing my output with that from m$. btw, also disable the "process .pdata section" if you load a pdb with my tool, because that would create these func_xxxxxxxx names, which should exist the pdb with the real name, so we dont need them.

greetz WV

PS: here your function in the output from m$:


(001D84) S_GPROC32: [0003:0141C9E0], Cb: 00000318, Type:         0x0006E327, ai_communication_get_player_rating

         Parent: 00000000, End: 00001F18, Next: 00000000

         Debug start: 00000014, Debug end: 00000318



(001DD0)  S_FRAMEPROC:

          Frame size = 0x000001F0 bytes

          Pad size = 0x00000000 bytes

          Offset of pad in frame = 0x00000000

          Size of callee save registers = 0x00000000

          Address of exception handler = 0000:00000000

          Function info: asynceh invalid_pgo_counts Local=default Param=default (0x00000200)

(001DF0)  S_LABEL32: [0003:0141C9F4], $M33737

(001E04)  S_REGISTER: r3, Type:       T_LONG(0012), unit_index

(001E1C)  S_REGISTER: r4, Type:      T_UCHAR(0020), test_line_of_sight

(001E3C)  S_REGISTER: r5, Type:    T_32PLONG(0412), unit_index_reference

(001E5C)  S_REGISTER: r6, Type:  T_32PREAL32(0440), distance_reference

(001E7C)  S_REGREL32: r1+00000090, Type:         0x000223AF, iterator

(001E94)  S_REGREL32: r1+00000060, Type:         0x000222A8, unit_head_position

(001EB8)  S_REGREL32: r1+00000050, Type:         0x000222A8, player_head_position

(001EDC)  S_REGREL32: r1+000000A0, Type:         0x0006DF09, collision

(001EF4)  S_REGREL32: r1+00000080, Type:         0x000223A1, player_aiming_vector



(001F18) S_END

I appreciate the reply c: I may have actually...rewritten pretty much all the pdb parser already. I have most of the TPI types (except for the 16 bit versions) parsed. I created a common interface and I'm working on converting the parsed type strings into ghidra data types.

I have the same problem though of short time so I work on it when I can, I have been using the dia dumb application that comes with visual studio for cross referencing.

I have an idea for naming conflicts to just use a generated UUID. After that I want to get symbols parsed too, such as globals, and I want to construct classes from the class methods and data. Was kinda hard to find documentation on this, I had to use Microsoft's PDB GitHub (and my god does Microsoft have terrible naming conventions) page and a PDF from like 2001.

So, if I am able to finish this, and you're open to contribution I can share it at some point. No guarantee 😔 but I'll try. If not, I'll just use it for my own project needs. It'll just need some testing on more PDBs than the one I have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants