Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

babashka (GraalVM) have broken UTF-8 support #260161

Closed
diegobfernandez opened this issue Oct 10, 2023 · 10 comments · Fixed by #267861
Closed

babashka (GraalVM) have broken UTF-8 support #260161

diegobfernandez opened this issue Oct 10, 2023 · 10 comments · Fixed by #267861

Comments

@diegobfernandez
Copy link
Contributor

Describe the bug

After #257292 babashka can't display Unicode characters

I have a babashka script that outputs the following JSON:

{"text":"󰖨","alt":"light","tooltip":"light","class":["light"]}

The text is a Nerd Font Symbol code point.

After 88c5afe the output is:

{"text":"?","alt":"light","tooltip":"light","class":["light"]}

Steps To Reproduce

Steps to reproduce the behavior:

[user@system:~]$ bb -e '(prn "bépo àê")'
"b??po ????"

Expected behavior

[user@system:~]$ bb -e '(prn "bépo àê")'
"bépo àê"

Additional context

This has been fixed in the past by #153457

I did not test other packages that could be affected by the PR.

Notify maintainers

@thiagokokada

Metadata

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.5.5, NixOS, 23.05 (Stoat), 23.05.20231007.5a237ae`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.5`
 - channels(zealouscupuacu): `"nixos-22.11, nixos-unstable"`
 - channels(root): `"nixos-23.05, nixos-unstable"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@thiagokokada
Copy link
Contributor

Pretty sure this is related to this bug in GraalVM upstream: oracle/graal#7502

I am curious if this is affecting projects using GraalVM 21.0.0 vanilla or if not, how is it working and why it is broken in nixpkgs.

@thiagokokada
Copy link
Contributor

It does seem that even setting up LC_ALL inside the C compiler environment is not sufficient to make it work.

CC @heyarne to see if they have more context on what the possibly issue is.

@sg-qwt
Copy link
Contributor

sg-qwt commented Nov 15, 2023

I also hit this today. I tested the same version(v1.3.186) from upstream binary release, which does not have this issue. https://github.com/babashka/babashka/releases/download/v1.3.186/babashka-1.3.186-linux-amd64-static.tar.gz

Shall we always sync the same version of graavm as upstream build ci? babashka/babashka#1640

@thiagokokada
Copy link
Contributor

Shall we always sync the same version of graavm as upstream build ci? babashka/babashka#1640

I don't think this is the issue and even if it was, it is not like babashka is the only GraalVM package we have.

sg-qwt added a commit to sg-qwt/nixos that referenced this issue Nov 15, 2023
floscr added a commit to floscr/dotfiles that referenced this issue Nov 15, 2023
@thiagokokada
Copy link
Contributor

I know that folks can't reproduce this issue with babashka upstream, but I also can't reproduce it with GraalVM using Java, e.g.:

// HelloWorld.java
public class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello World");
    System.out.println("bépo àê");
  }
}
$ javac HelloWorld.java
$ native-image -H:+UnlockExperimentalVMOptions -H:-CheckToolchain -H:+ReportExceptionStackTraces HelloWorld --verbose -march=compatibility
$ ./helloworld | fgrep 'Hello World'
$ ./helloworld | fgrep 'bépo àê' # works fine

A patch if anyone wants to test to be applied to graalvm-ce:

diff --git a/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix b/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix
index d7396d5f2d5c..a2e5aa427322 100644
--- a/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix
+++ b/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix
@@ -171,6 +171,7 @@ let
         public class HelloWorld {
           public static void main(String[] args) {
             System.out.println("Hello World");
+            System.out.println("bépo àê");
           }
         }
       ''} > HelloWorld.java
@@ -183,16 +184,19 @@ let
       echo "Ahead-Of-Time compilation"
       $out/bin/native-image -H:+UnlockExperimentalVMOptions -H:-CheckToolchain -H:+ReportExceptionStackTraces HelloWorld
       ./helloworld | fgrep 'Hello World'
+      ./helloworld | fgrep 'bépo àê'
 
       ${# --static is only available in Linux
       lib.optionalString (stdenv.isLinux && !useMusl) ''
         echo "Ahead-Of-Time compilation with -H:+StaticExecutableWithDynamicLibC"
         $out/bin/native-image -H:+UnlockExperimentalVMOptions -H:+StaticExecutableWithDynamicLibC HelloWorld
         ./helloworld | fgrep 'Hello World'
+        ./helloworld | fgrep 'bépo àê'
 
         echo "Ahead-Of-Time compilation with --static"
         $out/bin/native-image --static HelloWorld
         ./helloworld | fgrep 'Hello World'
+        ./helloworld | fgrep 'bépo àê'
       ''}
 
       ${# --static is only available in Linux
@@ -200,6 +204,7 @@ let
         echo "Ahead-Of-Time compilation with --static and --libc=musl"
         $out/bin/native-image --static HelloWorld --libc=musl
         ./helloworld | fgrep 'Hello World'
+        ./helloworld | fgrep 'bépo àê'
       ''}
 
       runHook postInstallCheck

So, can anyone reproduce this issue with GraalVM Native Image itself? If not, the only thing I can think is that the issue here is the babashka GCC (i.e.: non-static) version (upstream compiles with musl by default). Or something else.

@thiagokokada
Copy link
Contributor

So I got this warning:

/nix/store/lf0wpjrj8yx4gsmw2s3xfl58ixmqk8qa-bash-5.2-p15/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
Warning: Option 'EnableAllSecurityServices' is deprecated and might be removed in a future release. Please refer to the GraalVM release notes.

But this was after native-image already started to build, so I assume this is the same issue as oracle/graal#7502 and need to be fixed upstream.

@thiagokokada
Copy link
Contributor

thiagokokada commented Nov 15, 2023

So I got this warning:

/nix/store/lf0wpjrj8yx4gsmw2s3xfl58ixmqk8qa-bash-5.2-p15/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
Warning: Option 'EnableAllSecurityServices' is deprecated and might be removed in a future release. Please refer to the GraalVM release notes.

But this was after native-image already started to build, so I assume this is the same issue as oracle/graal#7502 and need to be fixed upstream.

To explain in more details why I think this is an upstream issue: I don't get the same warning: setlocal in the initial part of the build, only when the build starts that I get this warning.

So what I assume native-image is doing:

  • Running the CC (C Compiler) inside a shell with clean environment
  • We see the bash: warning: setlocale, because this shell couldn't set the proper locale thanks to missing environment variables (e.g.: LOCALE_ARCHIVE = "${pkgs.glibcLocales}/lib/locale/locale-archive")
  • Since this shell environment has no environment variables (at least, not the environment variables that we need), this ends up breaking macOS compilation and locale (Unicode) support in Linux

Not sure why it works in GraalVM itself though yet. This is all really strange.

@sg-qwt
Copy link
Contributor

sg-qwt commented Nov 16, 2023

While I'm not familiar with graalvm, I do find a hack for this issue, maybe @thiagokokada can weigh in to test whether this hack is valid:

Some inspiration from a random reddit post and those issues:
https://www.reddit.com/r/javahelp/comments/17vtjbk/printing_unicode_characters/
oracle/graal#2492
oracle/graal#2398

And I do find https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jdk/SystemInOutErrSupport.java it's using System.getProperty("sun.stdout.encoding"), so I thought if I added those two into build args, maybe it'd work. And indeed it seems to work..

    extraNativeImageBuildArgs = [
      "-H:+ReportExceptionStackTraces"
      "-J-Dsun.stdout.encoding=UTF-8"
      "-J-Dsun.stderr.encoding=UTF-8"
      "--no-fallback"
      "--native-image-info"
      "--enable-preview"
    ];

After I built bb with above args,

> ./result/bin/bb -e '(System/getProperty "stdout.encoding")'
"UTF-8"

# unicode displayed correctly 
> ./result/bin/bb -e "(println \"你们\")" 
你们

# before it reported this
> nix run github:NixOS/nixpkgs/nixos-unstable#babashka-unwrapped -- -e '(System/getProperty "stdout.encoding")'
"ANSI_X3.4-1968"

# unicode not showing ??
> nix run github:NixOS/nixpkgs/nixos-unstable#babashka-unwrapped -- -e "(println \"你们\")"
??

@sg-qwt
Copy link
Contributor

sg-qwt commented Nov 16, 2023

I know that folks can't reproduce this issue with babashka upstream, but I also can't reproduce it with GraalVM using Java, e.g.:

// HelloWorld.java
public class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello World");
    System.out.println("bépo àê");
  }
}
$ javac HelloWorld.java
$ native-image -H:+UnlockExperimentalVMOptions -H:-CheckToolchain -H:+ReportExceptionStackTraces HelloWorld --verbose -march=compatibility
$ ./helloworld | fgrep 'Hello World'
$ ./helloworld | fgrep 'bépo àê' # works fine

A patch if anyone wants to test to be applied to graalvm-ce:

diff --git a/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix b/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix
index d7396d5f2d5c..a2e5aa427322 100644
--- a/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix
+++ b/pkgs/development/compilers/graalvm/community-edition/buildGraalvm.nix
@@ -171,6 +171,7 @@ let
         public class HelloWorld {
           public static void main(String[] args) {
             System.out.println("Hello World");
+            System.out.println("bépo àê");
           }
         }
       ''} > HelloWorld.java
@@ -183,16 +184,19 @@ let
       echo "Ahead-Of-Time compilation"
       $out/bin/native-image -H:+UnlockExperimentalVMOptions -H:-CheckToolchain -H:+ReportExceptionStackTraces HelloWorld
       ./helloworld | fgrep 'Hello World'
+      ./helloworld | fgrep 'bépo àê'
 
       ${# --static is only available in Linux
       lib.optionalString (stdenv.isLinux && !useMusl) ''
         echo "Ahead-Of-Time compilation with -H:+StaticExecutableWithDynamicLibC"
         $out/bin/native-image -H:+UnlockExperimentalVMOptions -H:+StaticExecutableWithDynamicLibC HelloWorld
         ./helloworld | fgrep 'Hello World'
+        ./helloworld | fgrep 'bépo àê'
 
         echo "Ahead-Of-Time compilation with --static"
         $out/bin/native-image --static HelloWorld
         ./helloworld | fgrep 'Hello World'
+        ./helloworld | fgrep 'bépo àê'
       ''}
 
       ${# --static is only available in Linux
@@ -200,6 +204,7 @@ let
         echo "Ahead-Of-Time compilation with --static and --libc=musl"
         $out/bin/native-image --static HelloWorld --libc=musl
         ./helloworld | fgrep 'Hello World'
+        ./helloworld | fgrep 'bépo àê'
       ''}
 
       runHook postInstallCheck

So, can anyone reproduce this issue with GraalVM Native Image itself? If not, the only thing I can think is that the issue here is the babashka GCC (i.e.: non-static) version (upstream compiles with musl by default). Or something else.

BTW, I cannot reproduce with GraalVM Native Image on helloworld java as well. Helloworld produced by GraalVM Native Image has correct unicode output.

@thiagokokada
Copy link
Contributor

And I do find https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jdk/SystemInOutErrSupport.java it's using System.getProperty("sun.stdout.encoding"), so I thought if I added those two into build args, maybe it'd work. And indeed it seems to work..

Thanks @sg-qwt for investigating this issue. Opened #267861 so we can have this fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants