[DETECTION] no encoding found, contrarily to chardet and cchardet #104
Labels
detection
Related to the charset detection mechanism, chaos/mess/coherence
help wanted
Extra attention is needed
Notice
I hereby announce that my raw input is not :
Provide the file
A accessible way of retrieving the file concerned. Host it somewhere with untouched encoding.
Verbose output
Expected encoding
chardet
andcchardet
both agree on windows-1252 but I'm not certain.Desktop (please complete the following information):
Additional context
Your package looks nice! I'm currently testing it with edge cases, i.e. HTML documents with strange or inconsistent encodings.
The issue is also referenced here: adbar/trafilatura#79
The text was updated successfully, but these errors were encountered: