Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.docx file is being detected as a zip file #21

Open
mjdavidson opened this issue May 21, 2019 · 1 comment
Open

.docx file is being detected as a zip file #21

mjdavidson opened this issue May 21, 2019 · 1 comment

Comments

@mjdavidson
Copy link

No description provided.

@danielgindi
Copy link
Collaborator

It IS a zip file. In order to properly detect it, you need to parse the whole file and look at the filenames.
Parsing zip entries means reading the end of the file, going back until a certain "magic" is found, find offsets and jump to the entry directory and parse it then.
As it requires random access to the file, it's not really suitable for this library in my opinion.

It IS possible though to introduce a second optional phase where you pass a full path to the file (or file descriptor), and detect zip-based types.
Other libraries do this by parsing the beginning of the file, hoping that there will be a "backup" entry name in the beginning in the small header chunk that is read. This is unreliable though and has more misses than hits when I tested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants