Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode not working in inf_mr() #62

Closed
eternal-flame-AD opened this issue Mar 25, 2023 · 3 comments
Closed

Unicode not working in inf_mr() #62

eternal-flame-AD opened this issue Mar 25, 2023 · 3 comments

Comments

@eternal-flame-AD
Copy link

eternal-flame-AD commented Mar 25, 2023

---
title: "Xaringan inf_mr"
output: xaringan::moon_reader
---

無限 `r system2("python", c("-c", shQuote('print("月読")')), stdout = TRUE)`

If I render through rmarkdown::render I get the expected "無限 月読" but if I try to use inf_mr I just get this message and a blank output:

Warning message:
In grep("<!-- DISABLE-SERVR-WEBSOCKET -->", body, fixed = TRUE) :
  input string 1 is invalid in this locale

It seems like this is coming from here: 69f1279

I tried to adjust the locale settings, if I do Sys.setlocale("LC_ALL", "Ja_JP.UTF-8") it fixes the above issue but now it doesn't decode the stdout correctly, I get: 無限 <8c><8e><93>

Some more locale gymnastics within the document probably could fix that but I think dynamic_site shouldn't assume the body is in the system locale.

My OS locale is English display and shift-JIS codepage.

[ins] r$> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.932  LC_CTYPE=English_United States.932    LC_MONETARY=English_United States.932 LC_NUMERIC=C                          LC_TIME=English_United States.932    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xaringan_0.28.1

loaded via a namespace (and not attached):
[1] compiler_4.2.3  fastmap_1.1.1   cli_3.6.0       htmltools_0.5.4 xfun_0.37       digest_0.6.31   rlang_1.1.0
@yihui
Copy link
Owner

yihui commented Apr 7, 2023

Sys.setlocale("LC_ALL", "Ja_JP.UTF-8")may not be enough, since it is only for changing the locale for R, but not for your operating system. Have you tried to set the locale to UTF-8 system-wide? (I don't use Windows but I assume you can do it in the control panel) Ideally when you restart your system and R, sessionInfo() should show the UTF-8 locale.

For useBytes = TRUE, I was following an R core member's suggestion: https://blog.r-project.org/2022/10/10/improvements-in-handling-bytes-encoding/index.html

I can certainly revert 69f1279 if necessary. Thanks!

@eternal-flame-AD
Copy link
Author

I read the article you mentioned and I think I know where the discrepancy was coming from, the reason is because this line:

if (is.raw(body)) body = rawToChar(body)

This assumes body is in system locale but HTML is should be automatically UTF-8 as declared in the meta tag. I think we should change it to something like this:

      if (is.raw(body)) {
        body = rawToChar(body)
        Encoding(body) = "UTF-8"
      }

I tested this and it fixes the issue.

@yihui
Copy link
Owner

yihui commented Apr 8, 2023

Great! That is also what I guessed (I should have declared the encoding explicitly). I'll commit the fix in a minute. Thanks!

@yihui yihui closed this as completed in b6b20a1 Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants