.docx file is being detected as a zip file #21

mjdavidson · 2019-05-21T01:03:42Z

No description provided.

danielgindi · 2019-05-21T04:01:55Z

It IS a zip file. In order to properly detect it, you need to parse the whole file and look at the filenames.
Parsing zip entries means reading the end of the file, going back until a certain "magic" is found, find offsets and jump to the entry directory and parse it then.
As it requires random access to the file, it's not really suitable for this library in my opinion.

It IS possible though to introduce a second optional phase where you pass a full path to the file (or file descriptor), and detect zip-based types.
Other libraries do this by parsing the beginning of the file, hoping that there will be a "backup" entry name in the beginning in the small header chunk that is read. This is unreliable though and has more misses than hits when I tested it.

TomiTakussaari mentioned this issue Jan 31, 2020

SVG with XML declaration is identified as XML #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.docx file is being detected as a zip file #21

.docx file is being detected as a zip file #21

mjdavidson commented May 21, 2019

danielgindi commented May 21, 2019

.docx file is being detected as a zip file #21

.docx file is being detected as a zip file #21

Comments

mjdavidson commented May 21, 2019

danielgindi commented May 21, 2019