Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes unless stdout/err encoding is UTF-8 #110

Closed
ocharles opened this issue Jul 21, 2017 · 20 comments · Fixed by #218
Closed

Crashes unless stdout/err encoding is UTF-8 #110

ocharles opened this issue Jul 21, 2017 · 20 comments · Fixed by #218
Labels

Comments

@ocharles
Copy link
Contributor

I've seen this on two projects now. I'm not sure if it's a problem with stack, my terminal, or hedgehog - but let's start the discussion here:

→ stack test                                                                                      ~/work/circuithub/api
circuithub-api-0.0.4: build (lib + exe + test)
circuithub-api-0.0.4: copy/register
circuithub-api-0.0.4: test (suite: circuithub-api-tests)
Completed 2 action(s).
Log files have been written to: /home/ollie/work/circuithub/api/.stack-work/logs/
Test suite failure for package circuithub-api-0.0.4
    circuithub-api-tests:  exited with: ExitFailure 1
Full log available at /home/ollie/work/circuithub/api/.stack-work/logs/circuithub-api-0.0.4-test.log


    circuithub-api-tests: setNumCapabilities: not supported in the non-threaded RTS
    circuithub-api-tests: <stdout>: commitBuffer: invalid argument (invalid character)

Yet running the binary directly works fine.

@jacobstanley
Copy link
Member

    circuithub-api-tests: setNumCapabilities: not supported in the non-threaded RTS

^ so this is likely because the test executable is not being compiled with -threaded, but that probably isn't why it's not working, it's just a warning.

    circuithub-api-tests: <stdout>: commitBuffer: invalid argument (invalid character)

^ this looks very similar to this problem #32

@ocharles
Copy link
Contributor Author

If anyone else lands here, adding this to main fixes things:

  hSetEncoding stdout utf8
  hSetEncoding stderr utf8

@moodmosaic
Copy link
Member

moodmosaic commented Jul 28, 2017

If anyone else lands here, adding this to main fixes things:

hSetEncoding stdout utf8
hSetEncoding stderr utf8

This seems to improve the situation (no <stdout>: commitBuffer: ... error) but the output looks weird:
image

@moodmosaic
Copy link
Member

Here's also a screenshot from a failing property:

image

@ocharles
Copy link
Contributor Author

ocharles commented Jul 29, 2017 via email

@moodmosaic
Copy link
Member

According to systeminfo it is en-us;English (United States).

@moodmosaic
Copy link
Member

In bash, locale returns these:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

@chris-martin
Copy link
Contributor

I'm also seeing this problem. Runs okay on my machine, but fails on Travis CI.

@moodmosaic
Copy link
Member

@chris-martin, you mean the output looks like the images above?

@chris-martin
Copy link
Contributor

chris-martin commented Jul 31, 2017

@moodmosaic Output looks like this:

Test suite failure for package loc-0.1.3.0
    hedgehog:  exited with: ExitFailure 1
Full log available at /home/travis/build/chris-martin/haskell-libraries/.stack-work/logs/loc-0.1.3.0-test.log
    Examples: 84  Tried: 84  Errors: 0  Failures: 0
    hedgehog: <stdout>: commitBuffer: invalid argument (invalid character)
The command "stack --nix --no-terminal test" exited with 1.

https://travis-ci.org/chris-martin/haskell-libraries/builds/259263874

@moodmosaic
Copy link
Member

@chris-martin, even after trying this workaround?

chris-martin added a commit to typeclasses/loc that referenced this issue Aug 20, 2017
@chris-martin
Copy link
Contributor

@moodmosaic Ah, no, that workaround does seem to fix it.

@thumphries
Copy link
Member

Wondering if this could be avoided by adding a main runner (as suggested in #97 ) that sets encoding accordingly?

@thumphries thumphries added the bug label Dec 11, 2017
@thumphries thumphries changed the title Crashes with stack test Crashes unless stdout/err encoding is UTF-8 Dec 11, 2017
@kirelagin
Copy link

Actually, if I understand the issue right, a proper fix would be to do something like this: https://phabricator.haskell.org/D1153. It would be especially nice not only to transliterate Haskell identifiers, but also refrain from using all the utf-8 goodness when the terminal does not support it.

@moodmosaic
Copy link
Member

@kirelagin I tried that on Win10 (phabricator.haskell.org/D1153) and seems to be doing a better job indeed:

import           Hedgehog
import qualified Hedgehog.Gen       as Gen
import qualified Hedgehog.Range     as Range

import           GHC.IO.Encoding    (mkTextEncoding, textEncodingName)
import           System.IO          (Handle, hGetEncoding, hSetEncoding, stderr,
                                     stdout)

tests :: IO Bool
tests = do
  hSetTranslit stdout
  hSetTranslit stderr
  checkParallel $$discover

-- | Change the character encoding of the given Handle to transliterate on
--   unsupported characters instead of throwing an exception.
--   https://phabricator.haskell.org/D1153
hSetTranslit :: Handle -> IO ()
hSetTranslit h = do
  menc <- hGetEncoding h
  case fmap textEncodingName menc of
    Just name | '/' `notElem` name -> do
      enc <- mkTextEncoding $ name ++ "//TRANSLIT"
      hSetEncoding h enc
    _ -> pure ()

@moodmosaic
Copy link
Member

So, the bare minimums to keep Windows happy are

import           System.IO (hSetEncoding, stdout, stderr, utf8)

main :: IO ()
main = do -- ^ or put those in your test runner's main
  hSetEncoding stdout utf8
  hSetEncoding stderr utf8
  ...

You might have to change the active console code page to UTF-8

chcp 65001

--

haskell-hedgehog-issue-110

angerman added a commit to input-output-hk/cardano-sl that referenced this issue Aug 17, 2018
hedgehog uses fancy unicode codepoints to point it's output; this is at odds with non-utf8 charsets and leads to GHC crashing with

<stdout>: commitBuffer: invalid argument (invalid character)

This change sets the stdout/stderr encoding to utf8 prior to running hedgehog tests via the `runTests` function.

The relevant hedgehog issue is hedgehogqa/haskell-hedgehog#110
angerman added a commit to input-output-hk/cardano-sl that referenced this issue Aug 17, 2018
hedgehog uses fancy unicode codepoints to point it's output; this is at odds with non-utf8 charsets and leads to GHC crashing with

<stdout>: commitBuffer: invalid argument (invalid character)

This change sets the stdout/stderr encoding to utf8 prior to running hedgehog tests via the `runTests` function.

The relevant hedgehog issue is hedgehogqa/haskell-hedgehog#110
angerman added a commit to input-output-hk/cardano-sl that referenced this issue Aug 17, 2018
hedgehog uses fancy unicode codepoints to point it's output; this is at odds with non-utf8 charsets and leads to GHC crashing with

<stdout>: commitBuffer: invalid argument (invalid character)

This change sets the stdout/stderr encoding to utf8 prior to running hedgehog tests via the `runTests` function.

The relevant hedgehog issue is hedgehogqa/haskell-hedgehog#110
angerman added a commit to input-output-hk/cardano-sl that referenced this issue Aug 17, 2018
hedgehog uses fancy unicode codepoints to point it's output; this is at odds with non-utf8 charsets and leads to GHC crashing with

<stdout>: commitBuffer: invalid argument (invalid character)

This change sets the stdout/stderr encoding to utf8 prior to running hedgehog tests via the `runTests` function.

The relevant hedgehog issue is hedgehogqa/haskell-hedgehog#110
angerman added a commit to input-output-hk/cardano-sl that referenced this issue Aug 17, 2018
hedgehog uses fancy unicode codepoints to point it's output; this is at odds with non-utf8 charsets and leads to GHC crashing with

<stdout>: commitBuffer: invalid argument (invalid character)

This change sets the stdout/stderr encoding to utf8 prior to running hedgehog tests via the `runTests` function.

The relevant hedgehog issue is hedgehogqa/haskell-hedgehog#110
moodmosaic added a commit to moodmosaic/haskell-hedgehog that referenced this issue Aug 20, 2018
On Windows, unlike Unix, the console itself is not a stream of `bytes` but
a spreadsheet of cells, each of which contains an UTF-16 character and a
color attribute.

That means if your application produces output using single-byte or
multi-byte character sets (which are ANSI, OEM, UTF-8 and many others) the
Windows converts that output to UTF-16 automatically regarding active
codepage selected in your console (run chcp from console command prompt to
check your active codepage).

If you want to work with UTF-8 encoding you have to select UTF-8 as active
console codepage. Just run  chcp 65001 command to do that.

Taken from:
- https://conemu.github.io/en/UnicodeSupport.html
- hedgehogqa#110 (comment)
moodmosaic added a commit to moodmosaic/haskell-hedgehog that referenced this issue Aug 20, 2018
@moodmosaic
Copy link
Member

/cc @angerman

KtorZ pushed a commit to input-output-hk/cardano-sl that referenced this issue Nov 9, 2018
hedgehog uses fancy unicode codepoints to point it's output; this is at odds with non-utf8 charsets and leads to GHC crashing with

<stdout>: commitBuffer: invalid argument (invalid character)

This change sets the stdout/stderr encoding to utf8 prior to running hedgehog tests via the `runTests` function.

The relevant hedgehog issue is hedgehogqa/haskell-hedgehog#110
erikd pushed a commit to erikd/haskell-hedgehog that referenced this issue Mar 2, 2020
joneshf added a commit to joneshf/purs-tools that referenced this issue Feb 14, 2021
We have to work around some encoding issue on certain environments.

There's some issue with `hedgehog` tests and the encoding of
STDOUT/STDERR:
hedgehogqa/haskell-hedgehog#110. It's not
really clear why this is always a problem with Haskell, nor is it really
important. The only thing that matters if that we have to mess around
with encodings in order to get things to work.

The `Hedgehog.Main.defaultMain` seems like it should take care of this
stuff, but it only messes with the buffering, not the encoding. In any
case, we set the encoding so we can get these tests passing in other
environments.
@dmjio
Copy link

dmjio commented Jun 10, 2021

export LC_ALL=C.UTF-8 worked for me

@moodmosaic
Copy link
Member

Thanks! Good to know that.

swtwsk added a commit to ArdanaLabs/ardana-dollar that referenced this issue Sep 10, 2021
Hedgehog uses Unicode's _check mark_ and _cross mark_ characters when
presenting test results. Haskell is searching for system locale
settings when writing text to a console and it crashes if the console
encoding differs from the expected one [1]. As Hedgehog uses Unicode
characters, it causes the exact issue [2]. Nix's locale default
encoding is, apparently, non-compliant with UTF-8. This fix sets UTF-8
encoding explicitly, allowing Nix to run tests and build the whole
derivation.

[1] https://serokell.io/blog/haskell-with-utf8#haskell-defaults
[2] hedgehogqa/haskell-hedgehog#110
@zmrocze
Copy link

zmrocze commented Apr 22, 2022

One can arrive at this issue when running from nix-shell. Apparently there are some problems and the solution from a similar problem worked for me. I guess it does no harm if I repeat it here:

Set LOCALE_ARCHIVE=/usr/lib/locale/locale-archive

or to different appropriate path [see links ^].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants