Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to convert string to the requested encoding when reading sav files with long strings #241

Open
ofajardo opened this issue Apr 23, 2021 · 5 comments

Comments

@ofajardo
Copy link

ofajardo commented Apr 23, 2021

Hi,

When reading a sav file that contains a long string (756 characters to be precise, with 755 the error does not show up) with an international character, Readstat gives the error:

Unable to convert string to the requested encoding (invalid byte sequence)

Attached an example save file. The sav file was produced with pyreadstat.

thanks in advance!

original report: Roche/pyreadstat#128

note: initially I reported the error was on writing, it is on reading!

eg.sav.zip

also attached a csv version of the file

eg.csv

@ofajardo ofajardo changed the title Unable to convert string to the requested encoding when writing sav files with long strings with international characters Unable to convert string to the requested encoding when reading sav files with long strings with international characters Apr 23, 2021
@ofajardo
Copy link
Author

ofajardo commented Apr 23, 2021

another observation is that a very similar file with only one character of difference (first variable name "aaaaa3" instead of "aaaaa2") does not raise the error, attached example file.
eg3.sav.zip

@evanmiller
Copy link
Contributor

Are UTF-8 strings being provided to the writer?

@ofajardo
Copy link
Author

Yes

@ofajardo ofajardo changed the title Unable to convert string to the requested encoding when reading sav files with long strings with international characters Unable to convert string to the requested encoding when reading sav files with long strings Dec 15, 2021
@ofajardo
Copy link
Author

ofajardo commented Dec 15, 2021

as mentioned in #260, it is possible to reproduce this error without any international character, (using only 'a's in this example) if the length of the string is at least 757. Another important thing to reproduce this is that the numerical values must be NANs. See #260 for C code to reproduce the issue.

@evanmiller evanmiller added the bug label Jan 16, 2023
@jacob-lee
Copy link

is this likely to be fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants