Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing failure due to #365 #421

Open
Javagedes opened this issue Aug 29, 2024 · 1 comment
Open

Parsing failure due to #365 #421

Javagedes opened this issue Aug 29, 2024 · 1 comment

Comments

@Javagedes
Copy link

Javagedes commented Aug 29, 2024

Hello, #365 with the latest release (2024.8.26) is resulting in a runtime parsing error for some UEFI binaries. I've attached a dump that contains one such binary, it's PDB, and it's MAP file. This binary is built using clang 18.1.5 on Windows. I've also provided the build command, so that you can see all flags used to generate the binary.

File "C:\Users\joeyvagedes\.virtualenvs\mu_tiano_platforms-hKhfHhJg\Lib\site-packages\pefile.py", line 4601, in parse_debug_directory
    dbg_type.ExDllCharacteristics,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'ExDllCharacteristics'

dump.zip

Build command

"C:\LLVM\bin\\clang" -MMD -MF c:\src\mu_tiano_platforms\Build\QemuSbsaPkg\DEBUG_CLANGPDB\AARCH64\MdeModulePkg\Bus\Pci\XhciDxe\XhciDxe\OUTPUT\AutoGen.obj.deps -g -Os -fshort-wchar -fno-builtin -fno-strict-aliasing -Wall -Werror -Wno-array-bounds -include AutoGen.h -fno-common -fstack-protector -mstack-protector-guard=global -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=XhciDxeStrings -Wno-parentheses-equality -Wno-tautological-compare -Wno-tautological-constant-out-of-range-compare -Wno-empty-body -Wno-unused-const-variable -Wno-varargs -Wno-unknown-warning-option -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-unaligned-access -Wno-microsoft-enum-forward-reference -fno-stack-protector -funsigned-char -ftrap-function=undefined_behavior_has_been_optimized_away_by_clang -Wno-address -Wno-shift-negative-value -Wno-unknown-pragmas -Wno-incompatible-library-redeclaration -Wno-null-dereference -mno-implicit-float -mms-bitfields -mno-stack-arg-probe -nostdlib -nostdlibinc -fseh-exceptions -m64 -mno-red-zone -mcmodel=small -Oz -flto -target aarch64-unknown-windows-gnu -gcodeview -funwind-tables -Wno-unused-but-set-variable -Wno-deprecated-non-prototype -Wno-constant-conversion -DDISABLE_NEW_DEPRECATED_INTERFACES -c -o c:\src\mu_tiano_platforms\Build\QemuSbsaPkg\DEBUG_CLANGPDB\AARCH64\MdeModulePkg\Bus\Pci\XhciDxe\XhciDxe\OUTPUT\.\AutoGen.obj -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdeModulePkg\Bus\Pci\XhciDxe -Ic:\src\mu_tiano_platforms\Build\QemuSbsaPkg\DEBUG_CLANGPDB\AARCH64\MdeModulePkg\Bus\Pci\XhciDxe\XhciDxe\DEBUG -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdePkg -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdePkg\Include -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdePkg\Test\UnitTest\Include -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdePkg\Test\Mock\Include -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdePkg\Library\MipiSysTLib\mipisyst\library\include -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdePkg\Include\AArch64 -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdeModulePkg -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdeModulePkg\Include -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdeModulePkg\Test\Mock\Include -Ic:\src\mu_tiano_platforms\MU_BASECORE\MdeModulePkg\Library\BrotliCustomDecompressLib\brotli\c\include c:\src\mu_tiano_platforms\Build\QemuSbsaPkg\DEBUG_CLANGPDB\AARCH64\MdeModulePkg\Bus\Pci\XhciDxe\XhciDxe\DEBUG\AutoGen.c

Javagedes added a commit to microsoft/mu_basecore that referenced this issue Aug 29, 2024
## Description

This commit modifies the PE parsing functionality to only parse the
headers of the image, rather than the entire image. This change is made
to improve performance and also the probability of failing to parse the
entire image. This comes after this commit
(erocarrera/pefile#365) in pefile resulted in
efi image parsing failures, breaking the build.

This commit also wraps the parsing of the image in a try-except block to
catch any exceptions that may be raised during parsing, to cleanly exit.

See: microsoft/mu_tiano_platforms#1025 and
erocarrera/pefile#421

- [ ] Impacts functionality?
- [ ] Impacts security?
- [ ] Breaking change?
- [ ] Includes tests?
- [ ] Includes documentation?

## How This Was Tested

Validated pipelines build on mu_tiano_platforms

## Integration Instructions

N/A
@aursulis
Copy link
Contributor

aursulis commented Oct 23, 2024

While the inclusion of #365 (and follow ups) triggers the problem, it looks like what is happening is a more severe misparsing of the file. This is a little interactive pdb sneak-peek:

Python 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pefile
>>> import pdb
>>> pdb.run('pefile.PE(r"D:\\dev\\thirdparty\\pefile_dump\\dump\\XhciDxe.exe")')
> <string>(1)<module>()
(Pdb) n
AttributeError: 'NoneType' object has no attribute 'ExDllCharacteristics'
> <string>(1)<module>()
(Pdb) w
  c:\python311\lib\bdb.py(600)run()
-> exec(cmd, globals, locals)
> <string>(1)<module>()
  c:\python311\lib\site-packages\pefile.py(2941)__init__()
-> self.__parse__(name, data, fast_load)
  c:\python311\lib\site-packages\pefile.py(3362)__parse__()
-> self.full_load()
  c:\python311\lib\site-packages\pefile.py(3479)full_load()
-> self.parse_data_directories()
  c:\python311\lib\site-packages\pefile.py(3768)parse_data_directories()
-> value = entry[1](dir_entry.VirtualAddress, dir_entry.Size)
  c:\python311\lib\site-packages\pefile.py(4601)parse_debug_directory()
-> dbg_type.ExDllCharacteristics,
(Pdb) d
> c:\python311\lib\site-packages\pefile.py(2941)__init__()
-> self.__parse__(name, data, fast_load)
(Pdb) d
> c:\python311\lib\site-packages\pefile.py(3362)__parse__()
-> self.full_load()
(Pdb) d
> c:\python311\lib\site-packages\pefile.py(3479)full_load()
-> self.parse_data_directories()
(Pdb) d
> c:\python311\lib\site-packages\pefile.py(3768)parse_data_directories()
-> value = entry[1](dir_entry.VirtualAddress, dir_entry.Size)
(Pdb) d
> c:\python311\lib\site-packages\pefile.py(4601)parse_debug_directory()
-> dbg_type.ExDllCharacteristics,
(Pdb) a
self = <pefile.PE object at 0x000001E7A34CF850>
rva = 63836
size = 28
(Pdb) pp dbg.dump()
['[IMAGE_DEBUG_DIRECTORY]',
 '0xDD5C     0x0   Characteristics:               0x80000   ',
 '0xDD60     0x4   TimeDateStamp:                 0x13       [Thu Jan  1 '
 '00:00:19 1970 UTC]',
 '0xDD64     0x8   MajorVersion:                  0x0       ',
 '0xDD66     0xA   MinorVersion:                  0x10      ',
 '0xDD68     0xC   Type:                          0x14      ',
 '0xDD6C     0x10  SizeOfData:                    0x200000  ',
 '0xDD70     0x14  AddressOfRawData:              0x1D      ',
 '0xDD74     0x18  PointerToRawData:              0x10000   ']
(Pdb) pp hex(63836)
'0xf95c'

0xf95c is the RVA of the debug directory. If we use fast_load and try to get data from that RVA just to confirm:

Python 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pefile
>>> import binascii
>>> p = pefile.PE(r"D:\dev\thirdparty\pefile_dump\dump\XhciDxe.exe", fast_load=True)
>>> binascii.hexlify(p.get_data(0xf95c, 0x1c))
b'00000800130000000000100014000000000020001d00000000000100'

However if you load the binary into, say, Ghidra, then you'll see that the actual contents at that RVA should be:

                             IMAGE_DEBUG_DIRECTORY_0000f95c                  XREF[1]:     00000130 (*)   
        0000f95c 00  00  00  00  00  00  00  00  00  00     IMAGE_DEBUG_DIRECTORY
                 00  00  02  00  00  00  92  00  00  00 
                 78  f9  00  00  b8  de  00  00
           0000f95c 00  00  00  00    ddw       0h                      Characteristics                      XREF[1]:     00000130 (*)   
           0000f960 00  00  00  00    ddw       0h                      TimeDateStamp
           0000f964 00  00           dw        0h                      MajorVersion
           0000f966 00  00           dw        0h                      MinorVersion
           0000f968 02  00  00  00    ddw       2h                      Type
           0000f96c 92  00  00  00    ddw       92h                     SizeOfData
           0000f970 78  f9  00  00    ddw       F978h                   AddressOfRawData
           0000f974 b8  de  00  00    ddw       DEB8h                   PointerToRawData

So it looks like pefile miscalculated some addresses and went off to read some unrelated part of the binary. Hopefully that's a good starting point for whoever has a better clue of how the address calculations in pefile work :)

The reason the inclusion of #365 triggers the problem is because the random piece of the binary that is misparsed as a debug directory entry just happens to have the value 0x14 in the correct spot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants