Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode and locales #7951

Closed
p5pRT opened this issue Jun 3, 2005 · 7 comments
Closed

unicode and locales #7951

p5pRT opened this issue Jun 3, 2005 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 3, 2005

Migrated from rt.perl.org#36118 (status was 'resolved')

Searchable as RT36118$

@p5pRT
Copy link
Author

p5pRT commented Jun 3, 2005

From [email protected]

Created by [email protected]

Running this​:

#!/usr/bin/perl -l
use locale;
my $string = chr(0x10c);
print $string =~ /$string/i ? "yes" : "no";

for me prints​:
no

Commenting out the "use locale", it prints "yes" as expected.
Dropping the i modifier also fixes it.

(I actually have no locale (LC...) environment varibles set on this machine)

Carefully reading the locale comments in perldoc perlunicode doesn't make me
expect this. I realize unicode and locales are documented to sometimes
interact odly, but a string without metachars not matching itself seems
*too* odd to me.

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl v5.8.6 - Fri Dec 24 19:25:13 CET 2004
It is being executed now by  Perl v5.8.4 - Thu Jun  3 13:28:19 CEST 2004.

Site configuration information for perl v5.8.4:

Configured by ton at Thu Jun  3 13:28:19 CEST 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.6.5, archname=i686-linux-64int-ld
    uname='linux quasar 2.6.5 #8 mon apr 5 05:41:20 cest 2004 i686 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=undef uselongdouble=define
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fomit-frame-pointer',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.4.0 20031231 (experimental)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.4:
    /usr/lib/perl5/5.8.4/i686-linux-64int-ld
    /usr/lib/perl5/5.8.4
    /usr/lib/perl5/site_perl/5.8.4/i686-linux-64int-ld
    /usr/lib/perl5/site_perl/5.8.4
    /usr/lib/perl5/site_perl
    .


Environment for perl v5.8.4:
    HOME=/home/ton
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/home/oracle/product/9.2/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:.
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2005

From [email protected]

Perl-5 . 8 . 0 @​ Ton . Iguana . Be <perl5-porters@​perl.org> writes​:

# New Ticket Created by perl-5.8.0@​ton.iguana.be
# Please include the string​: [perl #36118]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=36118 >

This is a bug report for perl from perl-5.8.0@​ton.iguana.be,
generated with the help of perlbug 1.35 running under perl v5.8.4.

-----------------------------------------------------------------
[Please enter your report here]

Running this​:

#!/usr/bin/perl -l
use locale;
my $string = chr(0x10c);
print $string =~ /$string/i ? "yes" : "no";

for me prints​:
no

Commenting out the "use locale", it prints "yes" as expected.
Dropping the i modifier also fixes it.

(I actually have no locale (LC...) environment varibles set on this machine)

Carefully reading the locale comments in perldoc perlunicode doesn't make me
expect this. I realize unicode and locales are documented to sometimes
interact odly, but a string without metachars not matching itself seems
*too* odd to me.

Well no LC* or LANG set implies C locale i.e. 7-bit ASCII
so your Unicode 'Č' has no mapping, and in particular no lower-case mapping

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=medium
---
This perlbug was built using Perl v5.8.6 - Fri Dec 24 19​:25​:13 CET 2004
It is being executed now by Perl v5.8.4 - Thu Jun 3 13​:28​:19 CEST 2004.

Site configuration information for perl v5.8.4​:

Configured by ton at Thu Jun 3 13​:28​:19 CEST 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration​:
Platform​:
osname=linux, osvers=2.6.5, archname=i686-linux-64int-ld
uname='linux quasar 2.6.5 #8 mon apr 5 05​:41​:20 cest 2004 i686 gnulinux '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=define
usemymalloc=y, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -fomit-frame-pointer',
cppflags='-fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='3.4.0 20031231 (experimental)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries​:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.3.2'
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:

---
@​INC for perl v5.8.4​:
/usr/lib/perl5/5.8.4/i686-linux-64int-ld
/usr/lib/perl5/5.8.4
/usr/lib/perl5/site_perl/5.8.4/i686-linux-64int-ld
/usr/lib/perl5/site_perl/5.8.4
/usr/lib/perl5/site_perl
.

---
Environment for perl v5.8.4​:
HOME=/home/ton
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/ton/bin.Linux​:/home/ton/bin​:/home/ton/bin.SampleSetup​:/usr/local/bin​:/usr/local/sbin​:/home/oracle/product/9.2/bin​:/usr/local/ar/bin​:/usr/games/bin​:/usr/X11R6/bin​:/usr/share/bin​:/usr/bin​:/usr/sbin​:/bin​:/sbin​:.
PERL_BADLANG (unset)
SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 19, 2005

From [email protected]

Carefully reading the locale comments in perldoc perlunicode doesn't
make me
expect this. I realize unicode and locales are documented to
sometimes
interact odly, but a string without metachars not matching itself
seems
*too* odd to me.

Well no LC* or LANG set implies C locale i.e. 7-bit ASCII
so your Unicode 'Č' has no mapping, and in particular no lower-case
mapping

well, it behaves exatly the same, even if the locale (LC_ALL) is set
properly to sl_SI.UTF-8 (on debian) for example

@p5pRT
Copy link
Author

p5pRT commented Jun 24, 2005

From [email protected]

In article <20050606180637.8966.6@​llama.ing-simmons.net>,
  Nick Ing-Simmons <nick@​ing-simmons.net> writes​:

Perl-5 . 8 . 0 @​ Ton . Iguana . Be <perl5-porters@​perl.org> writes​:

Running this​:

#!/usr/bin/perl -l
use locale;
my $string =3D chr(0x10c);
print $string =3D~ /$string/i ? "yes" : "no";

for me prints​:
no

Commenting out the "use locale", it prints "yes" as expected.
Dropping the i modifier also fixes it.

(I actually have no locale (LC...) environment varibles set on this machin=
e)

Carefully reading the locale comments in perldoc perlunicode doesn't make =
me
expect this. I realize unicode and locales are documented to sometimes=20
interact odly, but a string without metachars not matching itself seems=20
*too* odd to me.

Well no LC* or LANG set implies C locale i.e. 7-bit ASCII
so your Unicode '=C4=8C' has no mapping, and in particular no lower-case ma=
pping

No, I don't think its that simple. Just because you have the C locale
doesn't mean that everything is 7-bit and I don't have lower case
mappings​:

perl -wle 'use locale; $string = chr(0x10c); print unpack("H*", $string); print unpack("H*", lc($string))'
c48c
c48d

Anyways, the bug also appears if you *DO* set a locale.

And also notice that the bug disappears when I add a start of string anchor​:

#!/usr/bin/perl -l
use locale;
my $string = chr(0x10c);
print $string =~ /^$string/i ? "yes" : "no";

yes

Certainly having ^ here shouldn't matter. But it does.

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

From @khwilliamson

This should be fixed in 5.13.10
--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant