cyrillic turned into chinese #29

bobert13 · 2022-01-05T12:00:53Z

Hi

I have 2 files in cyrillic. I can read both without issue in MS Word.
The first seems to work fine with:

with open(fullpath) as infile:
                content = infile.read()
                text = rtf_to_text(content ,'ignore')

The second (bad.zip) gets turned into chinese characters

good.zip
bad.zip

sample output from the good one:

>>> tabtext =text.split("|||")
>>> print(tabtext[0])
Таблиця розподілу номерного ресурсу
Кіровоградська область|
Код зони - 52

sample output from the bad one:

>>> tabtext =text.split("|")
>>> print(tabtext[0])
亦犭桷 痤顼钿畴 眍戾痦钽 疱耋瘃
它獬怦赅 钺豚耱鼃
暑 珙龛 - 32

if i leave out the "ignore", i get:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 6: illegal multibyte sequence

any idea how i can work around this?

The text was updated successfully, but these errors were encountered:

bobert13 · 2022-01-06T07:08:02Z

Hi, I apologize in advance for my ignorance here. I'm pretty new to python. Based on this email, I'm assuming you put in a commit to fix whatever caused this issue. Can I upgrade my current version of striprtf using pip in order to get the fix to work? Thanks

…

On Wed, Jan 5, 2022 at 6:28 PM Joshy Cyriac ***@***.***> wrote: Closed #29 <#29> via b2e88aa <b2e88aa> . — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGJ3QFDMNDZLRAF4WTVNU33UURWTVANCNFSM5LJXP6PA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

joshy · 2022-01-06T07:26:45Z

Hi,

yes the issue is fixed but until now there was no new version. Now you can upgrade you striprtf version (0.0.19) and it should work.

BR Joshy

bobert13 · 2022-01-06T07:33:14Z

Awesome, thanks!

…

On Thu, Jan 6, 2022 at 9:26 AM Joshy Cyriac ***@***.***> wrote: Hi, yes the issue is fixed but until now there was no new version. Now you can upgrade you striprtf version (0.0.19) and it should work. BR Joshy — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGJ3QFHKV6VYBXPH4E3Q7STUUU737ANCNFSM5LJXP6PA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

joshy closed this as completed in b2e88aa Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cyrillic turned into chinese #29

cyrillic turned into chinese #29

bobert13 commented Jan 5, 2022 •

edited

Loading

bobert13 commented Jan 6, 2022 via email

joshy commented Jan 6, 2022

bobert13 commented Jan 6, 2022 via email

cyrillic turned into chinese #29

cyrillic turned into chinese #29

Comments

bobert13 commented Jan 5, 2022 • edited Loading

bobert13 commented Jan 6, 2022 via email

joshy commented Jan 6, 2022

bobert13 commented Jan 6, 2022 via email

bobert13 commented Jan 5, 2022 •

edited

Loading