Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_char() chokes on input with an invalid encoding #152

Closed
cpsievert opened this issue Aug 13, 2021 · 4 comments
Closed

read_char() chokes on input with an invalid encoding #152

cpsievert opened this issue Aug 13, 2021 · 4 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@cpsievert
Copy link

I came across this by running revdepcheck::cloud_report() on shiny and it seems @hadley was seeing the same thing in r-lib/revdepcheck#288 and attempted to fix in #133, but that was never merged. In my case, I was seeing:

Processing package results  42% (ipc)
Error in readChar(path, nchars = file.info(path)$size, ...) : 
  invalid UTF-8 input in readChar()

It turns out the error was coming from this call to rcmdcheck:::get_test_fail() which in turns calls rcmdcheck:::read_char()

I can reproduce the error locally by trying to read in {ipc}'s testthat.Rout.fail file the same way as rcmdcheck:::read_char():

path <- "ipc-testthat.Rout.fail"
readChar(path, nchars = file.info(path)$size)
#> Error in readChar(path, nchars = file.info(path)$size) : 
#>  invalid UTF-8 input in readChar()

With useBytes = TRUE, I can successfully read the file, but another downstream failure happens in rcmdcheck:::get_test_fail()'s call to nchar() (I'm guessing this is why @hadley said #133 (comment))

txt <- readChar(path, nchars = file.info(path)$size, useBytes = TRUE)
nchar(txt)
#> Error in nchar(txt) : invalid multibyte string, element 1

However, if I change the encoding to UTF-8, then it works:

Encoding(tx)
#> "unknown"
nchar(enc2utf8(txt))
#> [1] 12252
@gaborcsardi gaborcsardi transferred this issue from r-lib/rcmdcheck Aug 17, 2021
@gaborcsardi
Copy link
Member

gaborcsardi commented Aug 17, 2021

@jimhester Can you please point me to the R code that runs on the cloud check container? If that uses rcmdcheck as well then maybe this is a processx bug.

Some more discussion: #151

@jimhester
Copy link
Member

It does not use rcmdcheck in the cloud code. The code is in a private repo, I can send you a link in slack.

@gaborcsardi
Copy link
Member

OK, in that case it is hard to say how the output is corrupted. Maybe it is a base R bug, maybe it is something else. We can change rcmdcheck to to give a warning if the input is not in the native encoding, instead of erroring.

revdepcheck should still convert the cloud check output to the native encoding from UTF-8 (which we can assume for the the cloud check output, right?). If it is not doing that already. This would make cloud checks work on Windows for example. Or rcmdcheck could automatically try UTF-8 if the native encoding fails.

So in the end this can be worked around in rcmdcheck, I'll transfer this issue back there. :(

@gaborcsardi gaborcsardi transferred this issue from r-lib/revdepcheck Aug 18, 2021
@jimhester
Copy link
Member

Yeah I think we can assume the tests were run in UTF-8.

@gaborcsardi gaborcsardi added the feature a feature request or enhancement label Sep 7, 2021
@gaborcsardi gaborcsardi added this to the 2.0.0 milestone Sep 9, 2021
@gaborcsardi gaborcsardi changed the title read_char() chokes on UTF-8 characters read_char() chokes on input with an invalid encoding Sep 18, 2021
gaborcsardi added a commit that referenced this issue Sep 18, 2021
- Always convert output to UTF-8.
- Use the <xx> notation for bytes that are invalid in the native encoding.

Closes #152.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants