You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From my testing the %PDF- does not necessarily have to be at offset 0.
It can be located anywhere in the file. For example I can type some junk into the file in the beginning and it still opens file.
I received multiple files like this from people, so there is something or someone out in the wild that adds extra characters in front of the magic sequence.
A detector would look something like that it searches for a substring inside a search window:
defis_pdf(file_path):
withopen(file_path, "rb") asfile:
# may throw IOErrorheader=file.read(1024)
returnb"%PDF-"inheader
From what I see currently the library is not built to handle this kind of situation.
So I'm leaving this ticket here with this code snippet in case more advanced detection is implemented.
The text was updated successfully, but these errors were encountered:
Just to make sure, I did check out the PDF specifications themselves:
The PDF file begins with the 5 characters “%PDF–” and byte offsets shall be calculated from the
PERCENT SIGN (25h).
NOTE 1 This provision allows for arbitrary bytes preceding the %PDF- without impacting the viability of
the PDF file and its byte offsets.
So it is valid for PDFs to not strictly start with the %PDF- but must contain it in their header. Will work on a better way to detect this.
From my testing the
%PDF-
does not necessarily have to be at offset 0.It can be located anywhere in the file. For example I can type some junk into the file in the beginning and it still opens file.
I received multiple files like this from people, so there is something or someone out in the wild that adds extra characters in front of the magic sequence.
A detector would look something like that it searches for a substring inside a search window:
From what I see currently the library is not built to handle this kind of situation.
So I'm leaving this ticket here with this code snippet in case more advanced detection is implemented.
The text was updated successfully, but these errors were encountered: