Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding for Chinese characters #28

Closed
yilu1015 opened this issue Dec 9, 2021 · 1 comment
Closed

encoding for Chinese characters #28

yilu1015 opened this issue Dec 9, 2021 · 1 comment

Comments

@yilu1015
Copy link

yilu1015 commented Dec 9, 2021

Issue: Chinese characters not properly decoded.

Test file: test-with-chinese-characters.rtf.zip

Code

with open ('test-with-chinese-characters.rtf') as document:
    content = rtf_to_text(document.read())
    print (content)

Output:

Ó¡Ë¢Çé¿ö·´Ó³£º
201-003-00155 (Multiple)

ÊÐÕþ¸®Çé¿ö·´Ó³£º
022-021-00768 (Multiple)

Expected:

印刷情况反映:
201-003-00155 (Multiple)

市政府情况反映:
022-021-00768 (Multiple)
joshy added a commit that referenced this issue Dec 17, 2021
@joshy joshy closed this as completed Dec 17, 2021
@joshy
Copy link
Owner

joshy commented Jan 5, 2022

Hi, the rtf file has the wrong encoding, the correct encoding would be ansicpg936. I need to revert the changes made for this fix otherwise all other encodings will not work.

@joshy joshy reopened this Jan 5, 2022
@joshy joshy closed this as completed Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants