Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit 027471cf breaks t/op/sprintf2.t on FreeBSD-11 #17034

Closed
p5pRT opened this issue Jun 4, 2019 · 59 comments
Closed

commit 027471cf breaks t/op/sprintf2.t on FreeBSD-11 #17034

p5pRT opened this issue Jun 4, 2019 · 59 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 4, 2019

Migrated from rt.perl.org#134172 (status was 'pending release')

Searchable as RT134172$

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @jkeenan

The following smoke test indicates that a recent change to blead is
causing failures in t/op/sprintf2.t on FreeBSD-11.

#####
http​://perl5.test-smoke.org/report/88863
#####

In each of 36 different configurations in the smoke test run, all tests
pass individually, but the file has a whole is graded FAIL.

#####
$ cd t; ./perl harness -v op/sprintf2.t;cd -

ok 1 - the sprintf "%.<number>g" optimization
ok 2 - the sprintf "%.<number>f" optimization
ok 3 - width calculation under utf8 upgrade, length=1
...
ok 1698 - special-case %.0f on -4503599627370503
ok 1699 - croak for very large numeric format results
Dubious, test returned 9 (wstat 2304, 0x900)
All 1699 subtests passed
  (less 30 skipped subtests​: 1669 okay)

Test Summary Report


op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
  Non-zero exit status​: 9
Files=1, Tests=1699, 0 wallclock secs ( 0.20 usr 0.01 sys + 0.16 cusr
  0.11 csys = 0.48 CPU)
Result​: FAIL
#####

Bisecting with the following command ...

#####
perl Porting/bisect.pl --start=e3c1dc81bd --end=9f58603ced --target
t/op/sprintf2.t
#####

... gave this result​:

#####
...
good - zero exit from ./perl -Ilib t/op/sprintf2.t
Runner returned 0 for end revision at Porting/bisect.pl line 233.
That took 114 seconds.
#####

... which is not useful. I hypothesized that something is tickling
.t/TEST or ./t perl harness with respect to the failing test (or vice
versa).

To test this hypothesis, I reviewed 'git log' and noticed two commits
which might have touched sprintf functionality. I built perl at each of
them, saw that one PASS and one FAIL; confirmed the FAIL.

#####
commit 027471c
Author​: Tony Cook <tony@​develop-help.com>
AuthorDate​: Wed Mar 20 16​:47​:49 2019 +1100
Commit​: Tony Cook <tony@​develop-help.com>
CommitDate​: Mon Jun 3 15​:48​:34 2019 +1000

(perl #133913) limit numeric format results to INT_MAX

The return value of v?snprintf() is int, and we pay attention to that
return value, so limit the expected size of numeric formats to INT_MAX.
#####

Thank you very much.
Jim Keenan

#####
Summary of my perl5 (revision 5 version 31 subversion 1) configuration​:
  Commit id​: 027471c
  Platform​:
  osname=freebsd
  osvers=11.2-stable
  archname=amd64-freebsd-thread-multi
  uname='freebsd perlmonger.nycbug.org 11.2-stable freebsd
11.2-stable #0 r339445​: sat oct 20 00​:08​:11 utc 2018
root@​perlmonger.nycbug.org​:usrobjusrsrcsysgeneric amd64 '
  config_args='-des -Dusedevel -Duseithreads -Doptimize=-O2 -pipe
-fstack-protector -fno-strict-aliasing'
  hint=recommended
  useposix=true
  d_sigaction=define
  useithreads=define
  usemultiplicity=define
  use64bitint=define
  use64bitall=define
  uselongdouble=undef
  usemymalloc=n
  default_inc_excludes_dot=define
  bincompat5005=undef
  Compiler​:
  cc='cc'
  ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H
-fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include
-D_FORTIFY_SOURCE=2'
  optimize='-O2 -pipe -fstack-protector -fno-strict-aliasing'
  cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H
-fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
  ccversion=''
  gccversion='4.2.1 Compatible FreeBSD Clang 6.0.1
(tags/RELEASE_601/final 335540)'
  gccosandvers=''
  intsize=4
  longsize=8
  ptrsize=8
  doublesize=8
  byteorder=12345678
  doublekind=3
  d_longlong=define
  longlongsize=8
  d_longdbl=define
  longdblsize=16
  longdblkind=3
  ivtype='long'
  ivsize=8
  nvtype='double'
  nvsize=8
  Off_t='off_t'
  lseeksize=8
  alignbytes=8
  prototype=define
  Linker and Libraries​:
  ld='cc'
  ldflags ='-pthread -Wl,-E -fstack-protector-strong -L/usr/local/lib'
  libpth=/usr/lib /usr/local/lib /usr/lib/clang/6.0.1/lib /usr/lib
  libs=-lpthread -lgdbm -ldl -lm -lcrypt -lutil
  perllibs=-lpthread -ldl -lm -lcrypt -lutil
  libc=
  so=so
  useshrplib=false
  libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs
  dlext=so
  d_dlsymun=undef
  ccdlflags=' '
  cccdlflags='-DPIC -fPIC'
  lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl)​:
  Compile-time options​:
  HAS_TIMES
  MULTIPLICITY
  PERLIO_LAYERS
  PERL_COPY_ON_WRITE
  PERL_DONT_CREATE_GVSV
  PERL_IMPLICIT_CONTEXT
  PERL_MALLOC_WRAP
  PERL_OP_PARENT
  PERL_PRESERVE_IVUV
  PERL_USE_DEVEL
  USE_64_BIT_ALL
  USE_64_BIT_INT
  USE_ITHREADS
  USE_LARGE_FILES
  USE_LOCALE
  USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE
  USE_LOCALE_NUMERIC
  USE_LOCALE_TIME
  USE_PERLIO
  USE_PERL_ATOF
  USE_REENTRANT_API
  Built under freebsd
  Compiled at Jun 4 2019 00​:08​:26
  %ENV​:
  PERL2DIR="/home/jkeenan/gitwork/perl2"
  PERL_WORKDIR="/home/jkeenan/gitwork/perl"
  @​INC​:
  lib
  /usr/local/lib/perl5/site_perl/5.31.1/amd64-freebsd-thread-multi
  /usr/local/lib/perl5/site_perl/5.31.1
  /usr/local/lib/perl5/5.31.1/amd64-freebsd-thread-multi
  /usr/local/lib/perl5/5.31.1

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @jkeenan

On Tue, 04 Jun 2019 00​:14​:02 GMT, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead is
causing failures in t/op/sprintf2.t on FreeBSD-11.

#####
http​://perl5.test-smoke.org/report/88863
#####

In each of 36 different configurations in the smoke test run, all tests
pass individually, but the file has a whole is graded FAIL.

#####
$ cd t; ./perl harness -v op/sprintf2.t;cd -

ok 1 - the sprintf "%.<number>g" optimization
ok 2 - the sprintf "%.<number>f" optimization
ok 3 - width calculation under utf8 upgrade, length=1
...
ok 1698 - special-case %.0f on -4503599627370503
ok 1699 - croak for very large numeric format results
Dubious, test returned 9 (wstat 2304, 0x900)
All 1699 subtests passed
(less 30 skipped subtests​: 1669 okay)

Test Summary Report
-------------------
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9
Files=1, Tests=1699, 0 wallclock secs ( 0.20 usr 0.01 sys + 0.16 cusr
0.11 csys = 0.48 CPU)
Result​: FAIL
#####

Bisecting with the following command ...

#####
perl Porting/bisect.pl --start=e3c1dc81bd --end=9f58603ced --target
t/op/sprintf2.t
#####

... gave this result​:

#####
...
good - zero exit from ./perl -Ilib t/op/sprintf2.t
Runner returned 0 for end revision at Porting/bisect.pl line 233.
That took 114 seconds.
#####

... which is not useful. I hypothesized that something is tickling
.t/TEST or ./t perl harness with respect to the failing test (or vice
versa).

To test this hypothesis, I reviewed 'git log' and noticed two commits
which might have touched sprintf functionality. I built perl at each of
them, saw that one PASS and one FAIL; confirmed the FAIL.

#####
commit 027471c
Author​: Tony Cook <tony@​develop-help.com>
AuthorDate​: Wed Mar 20 16​:47​:49 2019 +1100
Commit​: Tony Cook <tony@​develop-help.com>
CommitDate​: Mon Jun 3 15​:48​:34 2019 +1000

(perl #133913) limit numeric format results to INT_MAX

The return value of v?snprintf() is int, and we pay attention to that
return value, so limit the expected size of numeric formats to INT_MAX.
#####

Thank you very much.
Jim Keenan

The problem lies in this unit test which was added to t/op/sprintf2.t in the commit in question​:

#####
# large uvsize needed so the large width is parsed properly
# large sizesize needed so the STRLEN check doesn't
if ($Config{intsize} == 4 && $Config{uvsize} > 4 && $Config{sizesize} > 4) {
  eval { my $x = sprintf("%7000000000E", 0) };
  like($@​, qr/^Numeric format result too large at /,
  "croak for very large numeric format results");
}
#####

If I comment out this test, I get a PASS.

So something's amiss with that test in relation to the harness.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From [Unknown Contact. See original ticket]

On Tue, 04 Jun 2019 00​:14​:02 GMT, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead is
causing failures in t/op/sprintf2.t on FreeBSD-11.

#####
http​://perl5.test-smoke.org/report/88863
#####

In each of 36 different configurations in the smoke test run, all tests
pass individually, but the file has a whole is graded FAIL.

#####
$ cd t; ./perl harness -v op/sprintf2.t;cd -

ok 1 - the sprintf "%.<number>g" optimization
ok 2 - the sprintf "%.<number>f" optimization
ok 3 - width calculation under utf8 upgrade, length=1
...
ok 1698 - special-case %.0f on -4503599627370503
ok 1699 - croak for very large numeric format results
Dubious, test returned 9 (wstat 2304, 0x900)
All 1699 subtests passed
(less 30 skipped subtests​: 1669 okay)

Test Summary Report
-------------------
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9
Files=1, Tests=1699, 0 wallclock secs ( 0.20 usr 0.01 sys + 0.16 cusr
0.11 csys = 0.48 CPU)
Result​: FAIL
#####

Bisecting with the following command ...

#####
perl Porting/bisect.pl --start=e3c1dc81bd --end=9f58603ced --target
t/op/sprintf2.t
#####

... gave this result​:

#####
...
good - zero exit from ./perl -Ilib t/op/sprintf2.t
Runner returned 0 for end revision at Porting/bisect.pl line 233.
That took 114 seconds.
#####

... which is not useful. I hypothesized that something is tickling
.t/TEST or ./t perl harness with respect to the failing test (or vice
versa).

To test this hypothesis, I reviewed 'git log' and noticed two commits
which might have touched sprintf functionality. I built perl at each of
them, saw that one PASS and one FAIL; confirmed the FAIL.

#####
commit 027471c
Author​: Tony Cook <tony@​develop-help.com>
AuthorDate​: Wed Mar 20 16​:47​:49 2019 +1100
Commit​: Tony Cook <tony@​develop-help.com>
CommitDate​: Mon Jun 3 15​:48​:34 2019 +1000

(perl #133913) limit numeric format results to INT_MAX

The return value of v?snprintf() is int, and we pay attention to that
return value, so limit the expected size of numeric formats to INT_MAX.
#####

Thank you very much.
Jim Keenan

The problem lies in this unit test which was added to t/op/sprintf2.t in the commit in question​:

#####
# large uvsize needed so the large width is parsed properly
# large sizesize needed so the STRLEN check doesn't
if ($Config{intsize} == 4 && $Config{uvsize} > 4 && $Config{sizesize} > 4) {
  eval { my $x = sprintf("%7000000000E", 0) };
  like($@​, qr/^Numeric format result too large at /,
  "croak for very large numeric format results");
}
#####

If I comment out this test, I get a PASS.

So something's amiss with that test in relation to the harness.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @hvds

On Mon, 03 Jun 2019 17​:14​:02 -0700, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead is
causing failures in t/op/sprintf2.t on FreeBSD-11.
[...]
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9

It's this non-zero exit status that's making it a fail. That's reporting that it terminated with a SEGV. It's a bit strange that a change is causing a segv only after that test has run successfully, but definitely a good reason for calling it failure.

I can't yet reproduce it here, but you should be getting a core dump. If you can generate one (ideally on a debugging perl) and get a stack trace out of gdb it should be possible to diagnose the problem.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @jkeenan

On Tue, 04 Jun 2019 01​:15​:35 GMT, hv wrote​:

On Mon, 03 Jun 2019 17​:14​:02 -0700, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead is
causing failures in t/op/sprintf2.t on FreeBSD-11.
[...]
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9

It's this non-zero exit status that's making it a fail. That's
reporting that it terminated with a SEGV. It's a bit strange that a
change is causing a segv only after that test has run successfully,
but definitely a good reason for calling it failure.

I can't yet reproduce it here, but you should be getting a core dump.
If you can generate one (ideally on a debugging perl) and get a stack
trace out of gdb it should be possible to diagnose the problem.

Hugo

I haven't used gdb in years, so I'm fumbling around.

Does this help?

#####
[t] $ gdb ./perl perl.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `./perl -I.. -MTestInit op/sprintf2.t'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done.
done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/libdl.so.1...Reading symbols from /usr/lib/debug//usr/lib/libdl.so.1.debug...done.
done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /lib/libm.so.5...Reading symbols from /usr/lib/debug//lib/libm.so.5.debug...done.
done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypt.so.5...Reading symbols from /usr/lib/debug//lib/libcrypt.so.5.debug...done.
done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /lib/libutil.so.9...Reading symbols from /usr/lib/debug//lib/libutil.so.9.debug...done.
done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /lib/libc.so.7...Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...done.
done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
done.
Loaded symbols for /libexec/ld-elf.so.1
#0 S_my_exit_jump (my_perl=0x801e22000) at perl.c​:5261
5261 if (cxstack_ix >= 0) {
(gdb) bt
#0 S_my_exit_jump (my_perl=0x801e22000) at perl.c​:5261
#1 0x0000000000458107 in Perl_my_failure_exit (my_perl=0x801e22000) at perl.c​:5249
#2 0x00000000005987e2 in Perl_die_unwind (my_perl=0x801e22000, msv=0x801f56838) at pp_ctl.c​:1797
#3 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000, pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe2a0)
  at util.c​:1711
#4 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#5 0x00000000006128bb in PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2309
#6 0x0000000000613298 in PerlIOUnix_open (my_perl=0x801e22000, self=0x81dd30, layers=0x801e1b8d8, n=0,
  mode=0x81e4c0 "r", fd=0, imode=0, perm=0, f=0x801eac038, narg=0, args=0x0) at perlio.c​:2607
#7 0x000000000061519c in PerlIOBuf_open (my_perl=0x801e22000, self=0x81da90, layers=0x801e1b8d8, n=1,
  mode=<value optimized out>, fd=1, imode=0, perm=0, f=0x0, narg=0, args=0x0) at perlio.c​:3897
#8 0x0000000000611238 in PerlIO_openn (my_perl=0x801e22000, layers=<value optimized out>,
  mode=<value optimized out>, fd=0, imode=<value optimized out>, perm=<value optimized out>, f=0x0, narg=0,
  args=0x0) at perlio.c​:1550
#9 0x000000000061010e in PerlIO_stdstreams (my_perl=0x801e22000) at perlio.c​:4928
#10 0x0000000000616ecd in Perl_PerlIO_stderr (my_perl=0x801e22000) at perlio.c​:4884
#11 0x00000000004f4463 in Perl_write_to_stderr (my_perl=0x801e22000, msv=0x801f56838) at util.c​:1516
#12 0x00000000005987da in Perl_die_unwind (my_perl=0x801e22000, msv=0x801f56838) at pp_ctl.c​:1796
#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000, pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680)
  at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>, argv=<value optimized out>, env=0x7fffffffe770)
---Type <return> to continue, or q <return> to quit---
  at perlmain.c​:155
#####
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @jkeenan

On Tue, 04 Jun 2019 02​:22​:56 GMT, jkeenan wrote​:

#3 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe2a0)
at util.c​:1711

That non-printing character is octal​:

357 277 275

#4 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized
out>) at util.c​:1745
...
#12 0x00000000005987da in Perl_die_unwind (my_perl=0x801e22000,
msv=0x801f56838) at pp_ctl.c​:1796
#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680)
at util.c​:1711

Same here.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @khwilliamson

On 6/3/19 8​:35 PM, James E Keenan via RT wrote​:

On Tue, 04 Jun 2019 02​:22​:56 GMT, jkeenan wrote​:

#3 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe2a0)
at util.c​:1711

That non-printing character is octal​:

357 277 275

which is REPLACEMENT CHARACTER

#4 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized
out>) at util.c​:1745
...
#12 0x00000000005987da in Perl_die_unwind (my_perl=0x801e22000,
msv=0x801f56838) at pp_ctl.c​:1796
#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680)
at util.c​:1711

Same here.

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @hvds

On Mon, 03 Jun 2019 19​:22​:56 -0700, jkeenan wrote​:

On Tue, 04 Jun 2019 01​:15​:35 GMT, hv wrote​:

On Mon, 03 Jun 2019 17​:14​:02 -0700, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead is
causing failures in t/op/sprintf2.t on FreeBSD-11.
[...]
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9

It's this non-zero exit status that's making it a fail. That's
reporting that it terminated with a SEGV.

I haven't used gdb in years, so I'm fumbling around.

Does this help?

Well, it is exactly what I asked for.

#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000, pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680) at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized
out>) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe770)
at perlmain.c​:155

We're in sys_term, so most of the interpreter has already been wiped out, but we're then calling croak(), which relies on the interpreter's infrastructure.

My best guess is that we're hitting 'Perl_croak_nocontext("panic​: MUTEX_UNLOCK (%d) [%s​:%d]" ...)' in one of the MUTEX_UNLOCK calls, and some wild writes have corrupted both one of our mutexes and perhaps this panic string.

It seems like it'd be hard for it to get this far with that level of corruption though.

I'll try harder to reproduce it​: I'd need some interactive gdb work to diagnose it.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2019

From @hvds

On Mon, 03 Jun 2019 21​:15​:59 -0700, hv wrote​:

On Mon, 03 Jun 2019 19​:22​:56 -0700, jkeenan wrote​:

On Tue, 04 Jun 2019 01​:15​:35 GMT, hv wrote​:

On Mon, 03 Jun 2019 17​:14​:02 -0700, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead
is
causing failures in t/op/sprintf2.t on FreeBSD-11.
[...]
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9

It's this non-zero exit status that's making it a fail. That's
reporting that it terminated with a SEGV.

I haven't used gdb in years, so I'm fumbling around.

Does this help?

Well, it is exactly what I asked for.

#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680) at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized
out> ) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe770)
at perlmain.c​:155

We're in sys_term, so most of the interpreter has already been wiped
out, but we're then calling croak(), which relies on the interpreter's
infrastructure.

My best guess is that we're hitting 'Perl_croak_nocontext("panic​:
MUTEX_UNLOCK (%d) [%s​:%d]" ...)' in one of the MUTEX_UNLOCK calls, and
some wild writes have corrupted both one of our mutexes and perhaps
this panic string.

Oops, s/MUTEX_UNLOCK/MUTEX_DESTROY/g

It seems like it'd be hard for it to get this far with that level of
corruption though.

I'll try harder to reproduce it​: I'd need some interactive gdb work to
diagnose it.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

On Tue, 04 Jun 2019 04​:15​:59 GMT, hv wrote​:

On Mon, 03 Jun 2019 19​:22​:56 -0700, jkeenan wrote​:

On Tue, 04 Jun 2019 01​:15​:35 GMT, hv wrote​:

On Mon, 03 Jun 2019 17​:14​:02 -0700, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to blead
is
causing failures in t/op/sprintf2.t on FreeBSD-11.
[...]
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9

It's this non-zero exit status that's making it a fail. That's
reporting that it terminated with a SEGV.

I haven't used gdb in years, so I'm fumbling around.

Does this help?

Well, it is exactly what I asked for.

#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680) at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized
out> ) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe770)
at perlmain.c​:155

We're in sys_term, so most of the interpreter has already been wiped
out, but we're then calling croak(), which relies on the interpreter's
infrastructure.

My best guess is that we're hitting 'Perl_croak_nocontext("panic​:
MUTEX_UNLOCK (%d) [%s​:%d]" ...)' in one of the MUTEX_UNLOCK calls, and
some wild writes have corrupted both one of our mutexes and perhaps
this panic string.

It seems like it'd be hard for it to get this far with that level of
corruption though.

I'll try harder to reproduce it​: I'd need some interactive gdb work to
diagnose it.

Hugo

This is now showing up on other OSes. See​:
http​://perl5.test-smoke.org/submatrix?test=../t/op/sprintf2.t&pversion=5.31.1

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @hvds

On Mon, 03 Jun 2019 21​:15​:59 -0700, hv wrote​:

On Mon, 03 Jun 2019 19​:22​:56 -0700, jkeenan wrote​:

#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680) at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value optimized
out> ) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe770)
at perlmain.c​:155

We're in sys_term, so most of the interpreter has already been wiped
out, but we're then calling croak(), which relies on the interpreter's
infrastructure.
[...]
I'll try harder to reproduce it​: I'd need some interactive gdb work to
diagnose it.

I still cannot reproduce here. Please could you try getting another stacktrace with the attached patch applied, to narrow down which bit of Perl_sys_term() we're in?

To help with subsequent tests, it would also be useful to narrow down the testcase. Please check if any of these reproduce the SEGV​:
  ./perl -e 'my $x = sprintf("%7000000000E", 0)'
  ./perl -e 'eval { my $x = sprintf("%7000000000E", 0) }; print "ok\n"'
  the attached sprintf2-short.t, with all but the new test removed

Cheers,

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @hvds

0001-DO-NOT-COMMIT-expand-PERL_SYS_TERM_BODY.patch
From 03ad1fae97d9b34922e0e00f387cebe466ca13ef Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <[email protected]>
Date: Wed, 5 Jun 2019 11:29:12 +0100
Subject: [PATCH] DO NOT COMMIT: expand PERL_SYS_TERM_BODY

Expand the unixish body for Perl_sys_term() for more useful stacktrace.
---
 perl.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/perl.c b/perl.c
index 69e3efcd70..acb7c4f2a2 100644
--- a/perl.c
+++ b/perl.c
@@ -143,7 +143,14 @@ Perl_sys_term(void)
 {
     dVAR;
     if (!PL_veto_cleanup) {
-	PERL_SYS_TERM_BODY();
+	HINTS_REFCNT_TERM;          /* PL_hints_mutex */
+        KEYWORD_PLUGIN_MUTEX_TERM;  /* PL_keyword_plugin_mutex */
+        OP_CHECK_MUTEX_TERM;        /* PL_check_mutex */
+        OP_REFCNT_TERM;             /* PL_op_mutex */
+        PERLIO_TERM;                /* PerlIO_teardown() */
+        MALLOC_TERM;                /* PL_malloc_mutex */
+        LOCALE_TERM;                /* PL_locale_mutex */
+        USER_PROP_MUTEX_TERM;       /* PL_user_prop_mutex */
     }
 }
 
-- 
2.17.1

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @hvds

#!./perl -w

# Tests for sprintf that do not fit the format of sprintf.t.

BEGIN {
  chdir 't' if -d 't';
  require './test.pl';
  require './charset_tools.pl';
  set_up_inc('../lib');
}

use strict;
use Config;

# large uvsize needed so the large width is parsed properly
# large sizesize needed so the STRLEN check doesn't
if ($Config{intsize} == 4 && $Config{uvsize} > 4 && $Config{sizesize} > 4) {
  eval { my $x = sprintf("%7000000000E", 0) };
  like($@​, qr/^Numeric format result too large at /,
  "croak for very large numeric format results");
}
done_testing();

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

On Wed, 05 Jun 2019 10​:39​:48 GMT, hv wrote​:

On Mon, 03 Jun 2019 21​:15​:59 -0700, hv wrote​:

On Mon, 03 Jun 2019 19​:22​:56 -0700, jkeenan wrote​:

#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680) at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value
optimized
out> ) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe770)
at perlmain.c​:155

We're in sys_term, so most of the interpreter has already been wiped
out, but we're then calling croak(), which relies on the
interpreter's
infrastructure.
[...]
I'll try harder to reproduce it​: I'd need some interactive gdb work
to
diagnose it.

I still cannot reproduce here. Please could you try getting another
stacktrace with the attached patch applied, to narrow down which bit
of Perl_sys_term() we're in?

To help with subsequent tests, it would also be useful to narrow down
the testcase. Please check if any of these reproduce the SEGV​:
./perl -e 'my $x = sprintf("%7000000000E", 0)'
./perl -e 'eval { my $x = sprintf("%7000000000E", 0) }; print
"ok\n"'
the attached sprintf2-short.t, with all but the new test removed

(The previously reported backtrace was from a version of t/op/sprintf2.t with all tests but the final one commented out.)

See attachment, hugo-results-20190605.txt.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

Work on RT 134172 sprintf2.t on FreeBSD

06/05 07​:53 am

1. Confirmed that I still have intact the DEBUGGING perl I built on nycbug host. Noted that at this point, while I'm in local branch rt-134172-sprintf2, that branch is identical to blead. Confirmed that t/op/sprintf2.t FAIL with segfault.

2. Copied Hugo's sprintf2-short.t into t/op. Ran it thru harness. Got FAIL.

3. Tried Hugo's two one-liners with my DEBUGGING build.

a. Hugo's first one-liner. (Note that this generates a warning.)

$ rm ./perl.core; ./perl -e 'my $x = sprintf("%7000000000E", 0)'; ls -ltr . | tail -1
Numeric format result too large at -e line 1.
Segmentation fault (core dumped)
-rw------- 1 jkeenan jkeenan 4816896 Jun 5 12​:08 perl.core

b. Hugo's second one-liner. (Note that this does not generate a warning.)

$ rm ./perl.core; ./perl -e 'eval { my $x = sprintf("%7000000000E", 0) }; print "ok\n"'; ls -ltr . | tail -1
Segmentation fault (core dumped)
-rw------- 1 jkeenan jkeenan 4816896 Jun 5 12​:09 perl.core

4. Applied Hugo's patch to branch; rebuilt perl with customary FreeBSD configuration switches plus '-DDEBUGGING'.

$ git clean -dfx
$ git am < ~/learn/perl/p5p/rt-134172-sprintf2/0001-DO-NOT-COMMIT-expand-PERL_SYS_TERM_BODY.patch
$ debugging_configure && make test_prep
$ $ ls ./perl.core t/perl.core
ls​: ./perl.core​: No such file or directory
ls​: t/perl.core​: No such file or directory

a. Re-run the test file and inspect the core dump.

#####
$ cd t;./perl harness -v op/sprintf2-short.t; cd -

ok 1 - croak for very large numeric format results
All 1 subtests passed

Test Summary Report


op/sprintf2-short.t (Wstat​: 139 Tests​: 1 Failed​: 0)
  Non-zero wait status​: 139
Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.02 cusr 0.02 csys = 0.06 CPU)
Result​: FAIL
/usr/home/jkeenan/gitwork/perl

$ gdb perl t/perl.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `./perl -I.. -MTestInit op/sprintf2-short.t'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done.
done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/libdl.so.1...Reading symbols from /usr/lib/debug//usr/lib/libdl.so.1.debug...done.
done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /lib/libm.so.5...Reading symbols from /usr/lib/debug//lib/libm.so.5.debug...done.
done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypt.so.5...Reading symbols from /usr/lib/debug//lib/libcrypt.so.5.debug...done.
done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /lib/libutil.so.9...Reading symbols from /usr/lib/debug//lib/libutil.so.9.debug...done.
done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /lib/libc.so.7...Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...done.
done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
done.
Loaded symbols for /libexec/ld-elf.so.1
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
(gdb) bt
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
#1 0x0000000000454979 in S_my_exit_jump (my_perl=0x801e22000) at perl.c​:5269
#2 0x00000000004581a7 in Perl_my_failure_exit (my_perl=0x801e22000) at perl.c​:5256
#3 0x0000000000598882 in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1797
#4 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b", args=0x7fffffffe290)
  at util.c​:1711
#5 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#6 0x000000000061295b in PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2309
#7 0x0000000000613338 in PerlIOUnix_open (my_perl=0x801e22000, self=0x81ddd0, layers=0x801e1b8d8, n=0,
  mode=0x81e560 "r", fd=0, imode=0, perm=0, f=0x801eaba38, narg=0, args=0x0) at perlio.c​:2607
#8 0x000000000061523c in PerlIOBuf_open (my_perl=0x801e22000, self=0x81db30, layers=0x801e1b8d8, n=1,
  mode=<value optimized out>, fd=1, imode=0, perm=0, f=0x0, narg=0, args=0x0) at perlio.c​:3897
#9 0x00000000006112d8 in PerlIO_openn (my_perl=0x801e22000, layers=<value optimized out>,
  mode=<value optimized out>, fd=0, imode=<value optimized out>, perm=<value optimized out>, f=0x0, narg=0,
  args=0x0) at perlio.c​:1550
#10 0x00000000006101ae in PerlIO_stdstreams (my_perl=0x801e22000) at perlio.c​:4928
#11 0x0000000000616f6d in Perl_PerlIO_stderr (my_perl=0x801e22000) at perlio.c​:4884
#12 0x00000000004f4503 in Perl_write_to_stderr (my_perl=0x801e22000, msv=0x801e1f048) at util.c​:1516
#13 0x000000000059887a in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1796
#14 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b", args=0x7fffffffe670)
  at util.c​:1711
#15 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#16 0x000000000044c596 in Perl_sys_term () at perl.c​:152
#17 0x0000000000421472 in main (argc=<value optimized out>, argv=<value optimized out>, env=0x7fffffffe758)
  at perlmain.c​:155

$ rm t/perl.core
#####

b. Run Hugo's first one-liner and inspect the core dump.

#####
$ rm ./perl.core; ./perl -e 'my $x = sprintf("%7000000000E", 0)'; ls -ltr . | tail -1
rm​: ./perl.core​: No such file or directory
Numeric format result too large at -e line 1.
Segmentation fault (core dumped)
-rw------- 1 jkeenan jkeenan 4816896 Jun 5 12​:26 perl.core

[perl] $ gdb perl ./perl.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `./perl -e my $x = sprintf("%7000000000E", 0)'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done.
done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/libdl.so.1...Reading symbols from /usr/lib/debug//usr/lib/libdl.so.1.debug...done.
done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /lib/libm.so.5...Reading symbols from /usr/lib/debug//lib/libm.so.5.debug...done.
done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypt.so.5...Reading symbols from /usr/lib/debug//lib/libcrypt.so.5.debug...done.
done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /lib/libutil.so.9...Reading symbols from /usr/lib/debug//lib/libutil.so.9.debug...done.
done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /lib/libc.so.7...Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...done.
done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
done.
Loaded symbols for /libexec/ld-elf.so.1
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
(gdb) bt
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
#1 0x0000000000454979 in S_my_exit_jump (my_perl=0x801e22000) at perl.c​:5269
#2 0x00000000004581a7 in Perl_my_failure_exit (my_perl=0x801e22000) at perl.c​:5256
#3 0x0000000000598882 in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1797
#4 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b", args=0x7fffffffe310)
  at util.c​:1711
#5 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#6 0x000000000061295b in PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2309
#7 0x0000000000613338 in PerlIOUnix_open (my_perl=0x801e22000, self=0x81ddd0, layers=0x801e1b8d8, n=0,
  mode=0x81e560 "r", fd=0, imode=0, perm=0, f=0x801ea9038, narg=0, args=0x0) at perlio.c​:2607
#8 0x000000000061523c in PerlIOBuf_open (my_perl=0x801e22000, self=0x81db30, layers=0x801e1b8d8, n=1,
  mode=<value optimized out>, fd=1, imode=0, perm=0, f=0x0, narg=0, args=0x0) at perlio.c​:3897
#9 0x00000000006112d8 in PerlIO_openn (my_perl=0x801e22000, layers=<value optimized out>,
  mode=<value optimized out>, fd=0, imode=<value optimized out>, perm=<value optimized out>, f=0x0, narg=0,
  args=0x0) at perlio.c​:1550
#10 0x00000000006101ae in PerlIO_stdstreams (my_perl=0x801e22000) at perlio.c​:4928
#11 0x0000000000616f6d in Perl_PerlIO_stderr (my_perl=0x801e22000) at perlio.c​:4884
#12 0x00000000004f4503 in Perl_write_to_stderr (my_perl=0x801e22000, msv=0x801e1f048) at util.c​:1516
#13 0x000000000059887a in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1796
#14 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b", args=0x7fffffffe6f0)
  at util.c​:1711
#15 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#16 0x000000000044c596 in Perl_sys_term () at perl.c​:152
---Type <return> to continue, or q <return> to quit---
#17 0x0000000000421472 in main (argc=<value optimized out>, argv=<value optimized out>, env=0x7fffffffe7d8)
  at perlmain.c​:155
#####

c. Run Hugo's second one-liner and inspect the core-dump.

#####
$ rm ./perl.core; ./perl -e 'eval { my $x = sprintf("%7000000000E", 0) }; print "ok\n"'; ls -ltr . | tail -1
ok
Segmentation fault (core dumped)
-rw------- 1 jkeenan jkeenan 4816896 Jun 5 12​:27 perl.core

$ gdb perl ./perl.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `./perl -e eval { my $x = sprintf("%7000000000E", 0) }; print "ok\n"'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done.
done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/libdl.so.1...Reading symbols from /usr/lib/debug//usr/lib/libdl.so.1.debug...done.
done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /lib/libm.so.5...Reading symbols from /usr/lib/debug//lib/libm.so.5.debug...done.
done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypt.so.5...Reading symbols from /usr/lib/debug//lib/libcrypt.so.5.debug...done.
done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /lib/libutil.so.9...Reading symbols from /usr/lib/debug//lib/libutil.so.9.debug...done.
done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /lib/libc.so.7...Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...done.
done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
done.
Loaded symbols for /libexec/ld-elf.so.1
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
(gdb) bt
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
#1 0x0000000000454979 in S_my_exit_jump (my_perl=0x801e22000) at perl.c​:5269
#2 0x00000000004581a7 in Perl_my_failure_exit (my_perl=0x801e22000) at perl.c​:5256
#3 0x0000000000598882 in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1797
#4 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b", args=0x7fffffffe300)
  at util.c​:1711
#5 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#6 0x000000000061295b in PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2309
#7 0x0000000000613338 in PerlIOUnix_open (my_perl=0x801e22000, self=0x81ddd0, layers=0x801e1b898, n=0,
  mode=0x81e560 "r", fd=0, imode=0, perm=0, f=0x801ea9038, narg=0, args=0x0) at perlio.c​:2607
#8 0x000000000061523c in PerlIOBuf_open (my_perl=0x801e22000, self=0x81db30, layers=0x801e1b898, n=1,
  mode=<value optimized out>, fd=1, imode=0, perm=0, f=0x0, narg=0, args=0x0) at perlio.c​:3897
#9 0x00000000006112d8 in PerlIO_openn (my_perl=0x801e22000, layers=<value optimized out>,
  mode=<value optimized out>, fd=0, imode=<value optimized out>, perm=<value optimized out>, f=0x0, narg=0,
  args=0x0) at perlio.c​:1550
#10 0x00000000006101ae in PerlIO_stdstreams (my_perl=0x801e22000) at perlio.c​:4928
#11 0x0000000000616f6d in Perl_PerlIO_stderr (my_perl=0x801e22000) at perlio.c​:4884
#12 0x00000000004f4503 in Perl_write_to_stderr (my_perl=0x801e22000, msv=0x801e1f048) at util.c​:1516
#13 0x000000000059887a in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1796
#14 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b", args=0x7fffffffe6e0)
  at util.c​:1711
#15 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#16 0x000000000044c596 in Perl_sys_term () at perl.c​:152
#17 0x0000000000421472 in main (argc=<value optimized out>, argv=<value optimized out>, env=0x7fffffffe7c0)
  at perlmain.c​:155
#####

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

On Wed, 05 Jun 2019 00​:37​:31 GMT, jkeenan wrote​:

On Tue, 04 Jun 2019 04​:15​:59 GMT, hv wrote​:

On Mon, 03 Jun 2019 19​:22​:56 -0700, jkeenan wrote​:

On Tue, 04 Jun 2019 01​:15​:35 GMT, hv wrote​:

On Mon, 03 Jun 2019 17​:14​:02 -0700, jkeenan@​pobox.com wrote​:

The following smoke test indicates that a recent change to
blead
is
causing failures in t/op/sprintf2.t on FreeBSD-11.
[...]
op/sprintf2.t (Wstat​: 2304 Tests​: 1699 Failed​: 0)
Non-zero exit status​: 9

It's this non-zero exit status that's making it a fail. That's
reporting that it terminated with a SEGV.

I haven't used gdb in years, so I'm fumbling around.

Does this help?

Well, it is exactly what I asked for.

#13 0x00000000004f4788 in Perl_vcroak (my_perl=0x801e22000,
pat=0x801f56838 "\030 �\001\b", args=0x7fffffffe680) at util.c​:1711
#14 0x00000000004f1b27 in Perl_croak_nocontext (pat=<value
optimized
out> ) at util.c​:1745
#15 0x000000000044c57e in Perl_sys_term () at perl.c​:146
#16 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe770)
at perlmain.c​:155

We're in sys_term, so most of the interpreter has already been wiped
out, but we're then calling croak(), which relies on the
interpreter's
infrastructure.

My best guess is that we're hitting 'Perl_croak_nocontext("panic​:
MUTEX_UNLOCK (%d) [%s​:%d]" ...)' in one of the MUTEX_UNLOCK calls,
and
some wild writes have corrupted both one of our mutexes and perhaps
this panic string.

It seems like it'd be hard for it to get this far with that level of
corruption though.

I'll try harder to reproduce it​: I'd need some interactive gdb work
to
diagnose it.

Hugo

This is now showing up on other OSes. See​:
http​://perl5.test-
smoke.org/submatrix?test=../t/op/sprintf2.t&pversion=5.31.1

But, to further aggravate us, the FAIL is showing up on our FreeBSD-11 smokers but not our FreeBSD-12 or -13 smoker.

FreeBSD-12.0-RELEASE-p4​: http​://perl5.test-smoke.org/report/89000 PASS
FreeBSD-13.0-CURRENT : http​://perl5.test-smoke.org/report/88939 PASS
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @hvds

On Wed, 05 Jun 2019 05​:39​:11 -0700, jkeenan wrote​:

(The previously reported backtrace was from a version of
t/op/sprintf2.t with all tests but the final one commented out.)

Ah ok; since the first one-liner is enough to reproduce the crash, I suggest sticking with that for further testing​:
  ./perl -e 'my $x = sprintf("%7000000000E", 0)'

[...]

#16 0x000000000044c596 in Perl_sys_term () at perl.c​:152
#17 0x0000000000421472 in main (argc=<value optimized out>, argv=<value optimized out>, env=0x7fffffffe758)
at perlmain.c​:155

That looks like it is in the line​:
  LOCALE_TERM; /* PL_locale_mutex */
.. and since this is one of the mutex destroy calls that happens _after_ we've torn down the perlio system it's no surprise that attempting to croak() goes badly.

From here I'll need some help​: I'm not sure how to refer to the location of this mutex for a threaded build in gdb - I was expecting it to be something like 'my_vars->Glocale_mutex', but at least on my local build that's not recognised.

If someone can tell us that, the next step will be to run the test in gdb with a hardware watchpoint on the mutex. That'll look something like this​:

shell% gdb --args ./perl -e 'my $x = sprintf("%7000000000E", 0)'
(gdb) # first let the interpreter initialise
(gdb) tbreak perlmain.c​:126
(gdb) run
Temporary breakpoint 1, main (argc=<optimized out>, argv=<optimized out>,
  env=<optimized out>) at perlmain.c​:126
126 if (!perl_parse(my_perl, xs_init, argc, argv, (char **)NULL))
(gdb) watch PL_locale_mutex # replace with appropriate expression to refer to this mutex
(gdb) run

Hopefully that would then stop somewhere other than in the MUTEX_DESTROY, and a stack trace will show us where and how the corruption is occurring.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

On Wed, 05 Jun 2019 13​:22​:36 GMT, hv wrote​:

On Wed, 05 Jun 2019 05​:39​:11 -0700, jkeenan wrote​:

(The previously reported backtrace was from a version of
t/op/sprintf2.t with all tests but the final one commented out.)

Ah ok; since the first one-liner is enough to reproduce the crash, I
suggest sticking with that for further testing​:
./perl -e 'my $x = sprintf("%7000000000E", 0)'

[...]

#16 0x000000000044c596 in Perl_sys_term () at perl.c​:152
#17 0x0000000000421472 in main (argc=<value optimized out>,
argv=<value optimized out>, env=0x7fffffffe758)
at perlmain.c​:155

That looks like it is in the line​:
LOCALE_TERM; /* PL_locale_mutex */
.. and since this is one of the mutex destroy calls that happens
_after_ we've torn down the perlio system it's no surprise that
attempting to croak() goes badly.

From here I'll need some help​: I'm not sure how to refer to the
location of this mutex for a threaded build in gdb - I was expecting
it to be something like 'my_vars->Glocale_mutex', but at least on my
local build that's not recognised.

If someone can tell us that, the next step will be to run the test in
gdb with a hardware watchpoint on the mutex. That'll look something
like this​:

shell% gdb --args ./perl -e 'my $x = sprintf("%7000000000E", 0)'
(gdb) # first let the interpreter initialise
(gdb) tbreak perlmain.c​:126
(gdb) run
Temporary breakpoint 1, main (argc=<optimized out>, argv=<optimized
out>,
env=<optimized out>) at perlmain.c​:126
126 if (!perl_parse(my_perl, xs_init, argc, argv, (char
**)NULL))
(gdb) watch PL_locale_mutex # replace with appropriate expression to
refer to this mutex
(gdb) run

Hopefully that would then stop somewhere other than in the
MUTEX_DESTROY, and a stack trace will show us where and how the
corruption is occurring.

Hugo

I'm not sure whether you wanted *me* to run the above commands prior to hearing from someone about mutexes, but, for what it's worth, here's what I got​:

#####
[perl] $ gdb --args ./perl -e 'my $x = sprintf("%7000000000E", 0)'
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb)
(gdb) tbreak perlmain.c​:126
Breakpoint 1 at 0x4213d6​: file perlmain.c, line 126.
(gdb) run
Starting program​: /usr/home/jkeenan/gitwork/perl/perl -e my\ \$x\ =\ sprintf\(\"%7000000000E\",\ 0\)
main (argc=3, argv=0x7fffffffe770, env=0x7fffffffe790) at perlmain.c​:126
126 if (!perl_parse(my_perl, xs_init, argc, argv, (char **)NULL))
(gdb) watch PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program​: /usr/home/jkeenan/gitwork/perl/perl -e my\ \$x\ =\ sprintf\(\"%7000000000E\",\ 0\)
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex

Old value = 0x0
New value = 0x801e1c1e0
0x0000000800c6caec in __pthread_mutex_init (mutex=0xa45b28, mutex_attr=<value optimized out>) at pthread_md.h​:94
94 {
Current language​: auto; currently minimal
(gdb) bt
#0 0x0000000800c6caec in __pthread_mutex_init (mutex=0xa45b28, mutex_attr=<value optimized out>)
  at pthread_md.h​:94
#1 0x000000000044c702 in perl_alloc () at perl.c​:97
#2 0x00000000004213a9 in main (argc=3, argv=0x7fffffffe770, env=0x7fffffffe790) at perlmain.c​:119
(gdb)
#####

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @hvds

On Wed, 05 Jun 2019 06​:32​:12 -0700, jkeenan wrote​:

I'm not sure whether you wanted *me* to run the above commands prior
to hearing from someone about mutexes

I had not expected it would work, but since it did let's go with it. :) (For me that just gives 'No symbol "PL_locale_mutex" in current context'.)

here's what I got​:

#####
[perl] $ gdb --args ./perl -e 'my $x = sprintf("%7000000000E", 0)'
[...]
Watchpoint 2​: PL_locale_mutex

Old value = 0x0
New value = 0x801e1c1e0
0x0000000800c6caec in __pthread_mutex_init (mutex=0xa45b28,
mutex_attr=<value optimized out>) at pthread_md.h​:94

That's slightly odd, the intent of the tbreak was to get past the initialize (which the stack trace shows is invoked from perlmain.c​:119). Never mind, please get to this point again, then continue with​:

(gdb) cont

.. and let's see where it stops next, and get another stack trace.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

On Wed, 05 Jun 2019 14​:22​:10 GMT, hv wrote​:

On Wed, 05 Jun 2019 06​:32​:12 -0700, jkeenan wrote​:

I'm not sure whether you wanted *me* to run the above commands prior
to hearing from someone about mutexes

I had not expected it would work, but since it did let's go with it.
:) (For me that just gives 'No symbol "PL_locale_mutex" in current
context'.)

here's what I got​:

#####
[perl] $ gdb --args ./perl -e 'my $x = sprintf("%7000000000E", 0)'
[...]
Watchpoint 2​: PL_locale_mutex

Old value = 0x0
New value = 0x801e1c1e0
0x0000000800c6caec in __pthread_mutex_init (mutex=0xa45b28,
mutex_attr=<value optimized out>) at pthread_md.h​:94

That's slightly odd, the intent of the tbreak was to get past the
initialize (which the stack trace shows is invoked from
perlmain.c​:119). Never mind, please get to this point again, then
continue with​:

(gdb) cont

.. and let's see where it stops next, and get another stack trace.

Hugo

After 'cont', the program hung for > 1 minute, so I Ctrl-C-ed it.

#####
...
Watchpoint 2​: PL_locale_mutex
Watchpoint 2​: PL_locale_mutex

Old value = 0x0
New value = 0x801e1c1e0
0x0000000800c6caec in __pthread_mutex_init (mutex=0xa45b28,
---Type <return> to continue, or q <return> to quit---
  mutex_attr=<value optimized out>) at pthread_md.h​:94
94 {
Current language​: auto; currently minimal
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
__vdso_gettc (th=0x7ffffffff270, tc=0x7fffffffe53c) at cpufunc.h​:326
326 __asm __volatile("lfence" : : : "memory");
(gdb) bt
#0 __vdso_gettc (th=0x7ffffffff270, tc=0x7fffffffe53c) at cpufunc.h​:326
#1 0x000000080183e729 in binuptime (bt=0x7fffffffe580, tk=0x7ffffffff1b0,
  abs=1) at /usr/src/lib/libc/sys/__vdso_gettimeofday.c​:43
#2 0x000000080183e818 in __vdso_clock_gettime (clock_id=13, ts=0x7fffffffe5e8)
  at /usr/src/lib/libc/sys/__vdso_gettimeofday.c​:149
#3 0x0000000801827a31 in __clock_gettime (clock_id=13, ts=0x7fffffffe5e8)
  at /usr/src/lib/libc/sys/clock_gettime.c​:46
#4 0x00000008018271da in time (t=0x801e225a8)
  at /usr/src/lib/libc/gen/time.c​:45
#5 0x000000000044fd6f in perl_parse (my_perl=0x801e22000,
  xsinit=0x421490 <xs_init>, argc=<value optimized out>, argv=0x306,
  env=<value optimized out>) at perl.c​:1822
#6 0x00000000004213ea in main (argc=<value optimized out>,
  argv=<value optimized out>, env=0x7fffffffe7a0) at perlmain.c​:126
#####

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @hvds

On Wed, 05 Jun 2019 07​:54​:26 -0700, jkeenan wrote​:

After 'cont', the program hung for > 1 minute, so I Ctrl-C-ed it.

Watchpoints can make things rather slow; you'll see from the stack trace that we're still in perl_parse(), so we haven't got as far as running the program yet. My guess is that it should complete in 10-15 minutes, but if possible please allow at least up to an hour before giving up on it.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @jkeenan

On Wed, 05 Jun 2019 15​:26​:36 GMT, hv wrote​:

On Wed, 05 Jun 2019 07​:54​:26 -0700, jkeenan wrote​:

After 'cont', the program hung for > 1 minute, so I Ctrl-C-ed it.

Watchpoints can make things rather slow; you'll see from the stack
trace that we're still in perl_parse(), so we haven't got as far as
running the program yet. My guess is that it should complete in 10-15
minutes, but if possible please allow at least up to an hour before
giving up on it.

Did not know that. Have to go AFK. Will resume tonight. Thanks.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2019

From @khwilliamson

On 6/5/19 9​:40 AM, James E Keenan via RT wrote​:

On Wed, 05 Jun 2019 15​:26​:36 GMT, hv wrote​:

On Wed, 05 Jun 2019 07​:54​:26 -0700, jkeenan wrote​:

After 'cont', the program hung for > 1 minute, so I Ctrl-C-ed it.

Watchpoints can make things rather slow; you'll see from the stack
trace that we're still in perl_parse(), so we haven't got as far as
running the program yet. My guess is that it should complete in 10-15
minutes, but if possible please allow at least up to an hour before
giving up on it.

Did not know that. Have to go AFK. Will resume tonight. Thanks.

I have access to a 11.1-RELEASE FreeBSD 11.1-RELEASE #0 r321309​: Fri Jul
21 14​:27​:25 UTC 2017

When I run ./perl -e 'my $x = sprintf("%7000000000E", 0)'
I get
Integer overflow in format string for sprintf at -e line 1.

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2019

From @jkeenan

On Wed, 05 Jun 2019 15​:26​:36 GMT, hv wrote​:

On Wed, 05 Jun 2019 07​:54​:26 -0700, jkeenan wrote​:

After 'cont', the program hung for > 1 minute, so I Ctrl-C-ed it.

Watchpoints can make things rather slow; you'll see from the stack
trace that we're still in perl_parse(), so we haven't got as far as
running the program yet. My guess is that it should complete in 10-15
minutes, but if possible please allow at least up to an hour before
giving up on it.

Hugo

I let it run overnight. No difference in results.

#####
Continuing.
^C [killed at 07​:35 next day]
Program received signal SIGINT, Interrupt.
__vdso_gettc (th=<value optimized out>, tc=0x7fffffffe10c)
  at /usr/src/lib/libc/x86/sys/__vdso_gettc.c​:325
325 }
bt
#0 __vdso_gettc (th=<value optimized out>, tc=0x7fffffffe10c)
  at /usr/src/lib/libc/x86/sys/__vdso_gettc.c​:325
#1 0x000000080183e729 in binuptime (bt=0x7fffffffe150, tk=0x7ffffffff1b0, abs=1)
  at /usr/src/lib/libc/sys/__vdso_gettimeofday.c​:43
#2 0x000000080183e818 in __vdso_clock_gettime (clock_id=13, ts=0x7fffffffe1b8)
  at /usr/src/lib/libc/sys/__vdso_gettimeofday.c​:149
#3 0x0000000801827a31 in __clock_gettime (clock_id=13, ts=0x7fffffffe1b8)
  at /usr/src/lib/libc/sys/clock_gettime.c​:46
#4 0x00000008018271da in time (t=0x801e225a8) at /usr/src/lib/libc/gen/time.c​:45
#5 0x000000000044fd6f in perl_parse (my_perl=0x801e22000, xsinit=0x421490 <xs_init>,
  argc=<value optimized out>, argv=0x306, env=<value optimized out>) at perl.c​:1822
#6 0x00000000004213ea in main (argc=<value optimized out>, argv=<value optimized out>,
  env=0x7fffffffe370) at perlmain.c​:126
#####

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2019

From @hvds

On Thu, 06 Jun 2019 04​:37​:28 -0700, jkeenan wrote​:

I let it run overnight. No difference in results.

Oh, that's sad, it's clearly hanging in that time() call for some reason.

Let's try a different approach, closer to the likely source of the problem. Line 12950 of sv.c should be "float_need = 1 /* possible unary minus */", the point we start handling the large number in the sprintf pattern. Check that this is the right line, and thn try the following​:

shell% gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
(gdb) break sv.c​:12950
(gdb) run
Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=0xd43260, sv=0xd72e78,
  pat=0xd76368 "%7000000000E", patlen=12, args=0x0, svargs=0xd48b18,
  sv_count=1, maybe_tainted=0x7fffffffd2ff, flags=0) at sv.c​:12950
12950 float_need = 1 /* possible unary minus */

and then​:
(gdb) disp PL_locale_mutex
(gdb) disp *PL_locale_mutex
(gdb) next
.. and repeat 'next' about 15 times until the Perl_croak() call. If the displayed values haven't changed by this point, try 'cont' to see if they do change by the time we hit the SEGV.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2019

From @jkeenan

On Thu, 06 Jun 2019 14​:17​:46 GMT, hv wrote​:

On Thu, 06 Jun 2019 04​:37​:28 -0700, jkeenan wrote​:

I let it run overnight. No difference in results.

Oh, that's sad, it's clearly hanging in that time() call for some
reason.

Let's try a different approach, closer to the likely source of the
problem. Line 12950 of sv.c should be "float_need = 1 /* possible
unary minus */", the point we start handling the large number in the
sprintf pattern. Check that this is the right line, and thn try the
following​:

shell% gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
(gdb) break sv.c​:12950
(gdb) run
Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=0xd43260, sv=0xd72e78,
pat=0xd76368 "%7000000000E", patlen=12, args=0x0,
svargs=0xd48b18,
sv_count=1, maybe_tainted=0x7fffffffd2ff, flags=0) at sv.c​:12950
12950 float_need = 1 /* possible unary minus */

and then​:
(gdb) disp PL_locale_mutex
(gdb) disp *PL_locale_mutex
(gdb) next
.. and repeat 'next' about 15 times until the Perl_croak() call. If
the displayed values haven't changed by this point, try 'cont' to see
if they do change by the time we hit the SEGV.

Hugo

#####
$ ./perl -Ilib -V​:config_args
config_args='-des -Dusedevel -Duseithreads -Doptimize=-O2 -pipe -fstack-protector -fno-strict-aliasing -DDEBUGGING';
[perl] $ gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
GNU gdb 6.1.1 [FreeBSD]
[snip]
(gdb) break sv.c​:12950
Breakpoint 1 at 0x557e4e​: file sv.c, line 12950.
(gdb) run
Starting program​: /usr/home/jkeenan/gitwork/perl/perl -we my\ \$x\ =\ sprintf\(\"%7000000000E\",\ 0\)

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>, sv=<value optimized out>,
  pat=<value optimized out>, patlen=<value optimized out>, args=0x0, svargs=0x801e16528,
  sv_count=1, maybe_tainted=0x801c02200, flags=0) at sv.c​:12967
12967 if (!lc_numeric_set) {
(gdb) disp PL_locale_mutex
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) disp *PL_locale_mutex
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings = 0x801e1c1e8, m_rb_lnk = 0,
  m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0,
  m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev = 0x0}
(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result too large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings = 0x801e1c1e8, m_rb_lnk = 0,
  m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0,
  m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>, sv=<value optimized out>,
  pat=<value optimized out>, patlen=<value optimized out>, args=0x0, svargs=0x801e16530,
  sv_count=1, maybe_tainted=0x452591, flags=0) at sv.c​:12967
12967 if (!lc_numeric_set) {
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings = 0x801e1c1e8, m_rb_lnk = 0,
  m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0,
  m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next
12972 STORE_LC_NUMERIC_SET_TO_NEEDED();
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings = 0x801e1c1e8, m_rb_lnk = 0,
  m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0,
  m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result too large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings = 0x801e1c1e8, m_rb_lnk = 0,
  m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0,
  m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next
Numeric format result too large at -e line 1.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
2​: *PL_locale_mutex = Error accessing memory address 0x2​: Bad address.
Disabling display 2 to avoid infinite recursion.
#####

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2019

From @jkeenan

On Thu, 06 Jun 2019 15​:12​:34 GMT, jkeenan wrote​:

On Thu, 06 Jun 2019 14​:17​:46 GMT, hv wrote​:

On Thu, 06 Jun 2019 04​:37​:28 -0700, jkeenan wrote​:

I let it run overnight. No difference in results.

Oh, that's sad, it's clearly hanging in that time() call for some
reason.

Let's try a different approach, closer to the likely source of the
problem. Line 12950 of sv.c should be "float_need = 1 /* possible
unary minus */", the point we start handling the large number in the
sprintf pattern. Check that this is the right line, and thn try the
following​:

shell% gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
(gdb) break sv.c​:12950
(gdb) run
Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=0xd43260, sv=0xd72e78,
pat=0xd76368 "%7000000000E", patlen=12, args=0x0,
svargs=0xd48b18,
sv_count=1, maybe_tainted=0x7fffffffd2ff, flags=0) at sv.c​:12950
12950 float_need = 1 /* possible unary minus */

and then​:
(gdb) disp PL_locale_mutex
(gdb) disp *PL_locale_mutex
(gdb) next
.. and repeat 'next' about 15 times until the Perl_croak() call. If
the displayed values haven't changed by this point, try 'cont' to see
if they do change by the time we hit the SEGV.

Hugo

#####
$ ./perl -Ilib -V​:config_args
config_args='-des -Dusedevel -Duseithreads -Doptimize=-O2 -pipe
-fstack-protector -fno-strict-aliasing -DDEBUGGING';
[perl] $ gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
GNU gdb 6.1.1 [FreeBSD]
[snip]
(gdb) break sv.c​:12950
Breakpoint 1 at 0x557e4e​: file sv.c, line 12950.
(gdb) run
Starting program​: /usr/home/jkeenan/gitwork/perl/perl -we my\ \$x\ =\
sprintf\(\"%7000000000E\",\ 0\)

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>,
sv=<value optimized out>,
pat=<value optimized out>, patlen=<value optimized out>, args=0x0,
svargs=0x801e16528,
sv_count=1, maybe_tainted=0x801c02200, flags=0) at sv.c​:12967
12967 if (!lc_numeric_set) {
(gdb) disp PL_locale_mutex
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) disp *PL_locale_mutex
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result too
large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>,
sv=<value optimized out>,
pat=<value optimized out>, patlen=<value optimized out>, args=0x0,
svargs=0x801e16530,
sv_count=1, maybe_tainted=0x452591, flags=0) at sv.c​:12967
12967 if (!lc_numeric_set) {
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next
12972 STORE_LC_NUMERIC_SET_TO_NEEDED();
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result too
large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops = 0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0
(gdb) next
Numeric format result too large at -e line 1.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at
pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
2​: *PL_locale_mutex = Error accessing memory address 0x2​: Bad address.
Disabling display 2 to avoid infinite recursion.
#####

Followed by​:

#####
(gdb) bt
#0 0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at pp_ctl.c​:1550
#1 0x0000000000454979 in S_my_exit_jump (my_perl=0x801e22000) at perl.c​:5269
#2 0x00000000004581a7 in Perl_my_failure_exit (my_perl=0x801e22000) at perl.c​:5256
#3 0x0000000000598882 in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1797
#4 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b",
  args=0x7fffffffe2e0) at util.c​:1711
#5 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#6 0x000000000061295b in PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2309
#7 0x0000000000613338 in PerlIOUnix_open (my_perl=0x801e22000, self=0x81ddd0, layers=0x801e1b8d8,
  n=0, mode=0x81e560 "r", fd=0, imode=0, perm=0, f=0x801ea9038, narg=0, args=0x0) at perlio.c​:2607
#8 0x000000000061523c in PerlIOBuf_open (my_perl=0x801e22000, self=0x81db30, layers=0x801e1b8d8,
  n=1, mode=<value optimized out>, fd=1, imode=0, perm=0, f=0x0, narg=0, args=0x0) at perlio.c​:3897
#9 0x00000000006112d8 in PerlIO_openn (my_perl=0x801e22000, layers=<value optimized out>,
  mode=<value optimized out>, fd=0, imode=<value optimized out>, perm=<value optimized out>, f=0x0,
  narg=0, args=0x0) at perlio.c​:1550
#10 0x00000000006101ae in PerlIO_stdstreams (my_perl=0x801e22000) at perlio.c​:4928
#11 0x0000000000616f6d in Perl_PerlIO_stderr (my_perl=0x801e22000) at perlio.c​:4884
#12 0x00000000004f4503 in Perl_write_to_stderr (my_perl=0x801e22000, msv=0x801e1f048) at util.c​:1516
#13 0x000000000059887a in Perl_die_unwind (my_perl=0x801e22000, msv=0x801e1f048) at pp_ctl.c​:1796
#14 0x00000000004f4828 in Perl_vcroak (my_perl=0x801e22000, pat=0x801e1f048 "\030 �\001\b",
  args=0x7fffffffe6c0) at util.c​:1711
#15 0x00000000004f1bc7 in Perl_croak_nocontext (pat=<value optimized out>) at util.c​:1745
#16 0x000000000044c596 in Perl_sys_term () at perl.c​:152
#17 0x0000000000421472 in main (argc=<value optimized out>, argv=<value optimized out>,
  env=0x7fffffffe7a0) at perlmain.c​:155
#####

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2019

From @hvds

On Thu, 06 Jun 2019 08​:16​:29 -0700, jkeenan wrote​:

(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result too
large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0, m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops =
0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0

Ok, so at this point it still looks valid ...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1) at
pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
2​: *PL_locale_mutex = Error accessing memory address 0x2​: Bad
address.
Disabling display 2 to avoid infinite recursion.

.. but by the time we get here it has been corrupted to 0x2.

I think it's worth another go at the 'watch', but if that fails we may have to step through with 'next' and 'step' until the displayed value changes. If we set the watchpoint later, it may work better​:

shell% gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
(gdb) break sv.c​:13092
(gdb) run
.. break at sv.c​:13092 (the croak call)
(gdb) watch PL_locale_mutex
(gdb) cont

As before, it should not take more than a few minutes after the 'cont'.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2019

From @jkeenan

On Thu, 06 Jun 2019 15​:47​:02 GMT, hv wrote​:

On Thu, 06 Jun 2019 08​:16​:29 -0700, jkeenan wrote​:

(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result too
large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0,
m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops =
0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0

Ok, so at this point it still looks valid ...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-1)
at
pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
2​: *PL_locale_mutex = Error accessing memory address 0x2​: Bad
address.
Disabling display 2 to avoid infinite recursion.

.. but by the time we get here it has been corrupted to 0x2.

I think it's worth another go at the 'watch', but if that fails we may
have to step through with 'next' and 'step' until the displayed value
changes. If we set the watchpoint later, it may work better​:

shell% gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
(gdb) break sv.c​:13092
(gdb) run
.. break at sv.c​:13092 (the croak call)
(gdb) watch PL_locale_mutex
(gdb) cont

As before, it should not take more than a few minutes after the
'cont'.

Hugo

'cont' returned within 1 second.

#####
$ gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) break sv.c​:13092
Breakpoint 1 at 0x55b2f0​: file sv.c, line 13092.
(gdb) run
Starting program​: /usr/home/jkeenan/gitwork/perl/perl -we my\ \$x\ =\ sprintf\(\"%7000000000E\",\ 0\)

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>, sv=<value optimized out>,
  pat=<value optimized out>, patlen=<value optimized out>, args=0x0, svargs=0x801e16528,
  sv_count=1, maybe_tainted=0x801c02200, flags=0) at sv.c​:13092
13092 Perl_croak(aTHX_ "Numeric format result too large");
(gdb) watch PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
(gdb) cont
Continuing.

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>, sv=<value optimized out>,
  pat=<value optimized out>, patlen=<value optimized out>, args=0x0, svargs=0x801e16530,
  sv_count=1, maybe_tainted=0x452591, flags=0) at sv.c​:13092
13092 Perl_croak(aTHX_ "Numeric format result too large");
(gdb) bt
#0 Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>, sv=<value optimized out>,
  pat=<value optimized out>, patlen=<value optimized out>, args=0x0, svargs=0x801e16530,
  sv_count=1, maybe_tainted=0x452591, flags=0) at sv.c​:13092
#1 0x0000000000553045 in Perl_sv_vsetpvfn (my_perl=0x801e22000, sv=0x801e98610,
  pat=0x801e20128 "%7000000000E", patlen=<value optimized out>, args=<value optimized out>,
  svargs=0x801e16530, sv_count=1, maybe_tainted=0x7fffffffe577) at sv.c​:10984
#2 0x00000000005c4f31 in Perl_do_sprintf (my_perl=0x801e22000, sv=0x801e98610,
  len=<value optimized out>, sarg=<value optimized out>) at doop.c​:734
#3 0x0000000000579ac0 in Perl_pp_sprintf (my_perl=0x801e22000) at pp.c​:3559
#4 0x00000000004f0340 in Perl_runops_debug (my_perl=0x801e22000) at dump.c​:2537
#5 0x0000000000453464 in S_run_body (my_perl=0x801e22000, oldscope=1) at inline.h​:65
#6 0x00000000004532d9 in perl_run (my_perl=0x801e22000) at perl.c​:2646
#7 0x00000000004213fa in main (argc=<value optimized out>, argv=<value optimized out>,
  env=0x7fffffffe7a0) at perlmain.c​:127
#####

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 10, 2019

From @khwilliamson

I’m traveling but cursory examining the patch looks good

Sent from my iPhone

On Jun 10, 2019, at 7​:24 AM, James E Keenan via RT <perlbug-followup@​perl.org> wrote​:

On Mon, 10 Jun 2019 11​:53​:37 GMT, hv wrote​:

On Thu, 06 Jun 2019 10​:30​:58 -0700, jkeenan wrote​:
Watchpoint 2​: PL_locale_mutex

Old value = 0x801e1c1e0
New value = 0x2
_pthread_mutex_destroy (mutex=0xa45b28) at pthread_md.h​:95
95 return (TCB_GET64(tcb_thread));

Ok, I can make sense of this bit now​: the pthread_mutex_destroy code
explicitly overwrites the pointer to signal that it has been
destroyed​:

/usr/src/lib/libthr/thread/thr_private.h has​:
#define THR_MUTEX_DESTROYED ((struct pthread_mutex *)2)
/usr/src/lib/libthr/thread/thr_mutex.c has​:
*mutex = THR_MUTEX_DESTROYED;
in multiple cases.

So this isn't corruption but intentional. The core of the problem,
though, is that we're seeing an error returned from the mutex_destroy,
which is what is triggering our attempt to croak. And the error turns
out to be EBUSY, indicating that we're trying to destroy a mutex that
is still locked, and with the debug option -DLv it becomes trivial to
see that​:

$ ./perl -Ilib -DLv -we 'my $x = sprintf("%2E", 0)'
sv.c​: 12972​: locking lc_numeric; depth=1
sv.c​: 13422​: unlocking lc_numeric; depth=0

EXECUTING...

$ ./perl -Ilib -DLv -we 'my $x = sprintf("%7000000000E", 0)'
sv.c​: 12972​: locking lc_numeric; depth=1

EXECUTING...

sv.c​: 12972​: avoided lc_numeric_lock; new depth=2
Numeric format result too large at -e line 1.
Segmentation fault (core dumped)
$

We're invoking STORE_LC_NUMERIC_SET_TO_NEEDED, which is documented
with​:
On threaded perls not operating with thread-safe functionality, this
macro uses
a mutex to force a critical section. Therefore the matching RESTORE
should be
close by, and guaranteed to be called.
.. however this is wrapping precisely the call with the new error
check, so when that croaks the RESTORE_LC_NUMERIC is missed, and that
causes us to attempt global destruction with the mutex still held.

Because there are 2 nested calls to the STORE macro, I'm not sure how
correct the obvious fix is; there also seems to be some cleverness
with pragmas to try and enforce balancing of STORE/RESTORE. I was able
to get it both to compile and run the test case without error with the
attached, but I think it's Karl's baby so I hope he can advise.

Note also​: a) there are other opportunities to hit fatal errors in
sprintf, I assume they'll all need the same handling; b) the
'mutex_destroy or croak' pattern makes no sense after
PerlIO_teardown(), but I'm not sure what we can do instead - the bug
uncovered here certainly suggests we'd have been unwise simply to
ignore errors on mutex_destroy.

Hugo

Thanks. I realize your patch is preliminary, but to get some data from other platforms, I've placed it in this branch​:

smoke-me/jkeenan/hv/134172-hack

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

---
via perlbug​: queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=134172

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2019

From @jkeenan

On Mon, 10 Jun 2019 12​:24​:19 GMT, jkeenan wrote​:

On Mon, 10 Jun 2019 11​:53​:37 GMT, hv wrote​:

On Thu, 06 Jun 2019 10​:30​:58 -0700, jkeenan wrote​:

Watchpoint 2​: PL_locale_mutex

Old value = 0x801e1c1e0
New value = 0x2
_pthread_mutex_destroy (mutex=0xa45b28) at pthread_md.h​:95
95 return (TCB_GET64(tcb_thread));

Ok, I can make sense of this bit now​: the pthread_mutex_destroy code
explicitly overwrites the pointer to signal that it has been
destroyed​:

/usr/src/lib/libthr/thread/thr_private.h has​:
#define THR_MUTEX_DESTROYED ((struct pthread_mutex *)2)
/usr/src/lib/libthr/thread/thr_mutex.c has​:
*mutex = THR_MUTEX_DESTROYED;
in multiple cases.

So this isn't corruption but intentional. The core of the problem,
though, is that we're seeing an error returned from the
mutex_destroy,
which is what is triggering our attempt to croak. And the error turns
out to be EBUSY, indicating that we're trying to destroy a mutex that
is still locked, and with the debug option -DLv it becomes trivial to
see that​:

$ ./perl -Ilib -DLv -we 'my $x = sprintf("%2E", 0)'
sv.c​: 12972​: locking lc_numeric; depth=1
sv.c​: 13422​: unlocking lc_numeric; depth=0

EXECUTING...

$ ./perl -Ilib -DLv -we 'my $x = sprintf("%7000000000E", 0)'
sv.c​: 12972​: locking lc_numeric; depth=1

EXECUTING...

sv.c​: 12972​: avoided lc_numeric_lock; new depth=2
Numeric format result too large at -e line 1.
Segmentation fault (core dumped)
$

We're invoking STORE_LC_NUMERIC_SET_TO_NEEDED, which is documented
with​:
On threaded perls not operating with thread-safe functionality,
this
macro uses
a mutex to force a critical section. Therefore the matching
RESTORE
should be
close by, and guaranteed to be called.
.. however this is wrapping precisely the call with the new error
check, so when that croaks the RESTORE_LC_NUMERIC is missed, and that
causes us to attempt global destruction with the mutex still held.

Because there are 2 nested calls to the STORE macro, I'm not sure how
correct the obvious fix is; there also seems to be some cleverness
with pragmas to try and enforce balancing of STORE/RESTORE. I was
able
to get it both to compile and run the test case without error with
the
attached, but I think it's Karl's baby so I hope he can advise.

Note also​: a) there are other opportunities to hit fatal errors in
sprintf, I assume they'll all need the same handling; b) the
'mutex_destroy or croak' pattern makes no sense after
PerlIO_teardown(), but I'm not sure what we can do instead - the bug
uncovered here certainly suggests we'd have been unwise simply to
ignore errors on mutex_destroy.

Hugo

Thanks. I realize your patch is preliminary, but to get some data
from other platforms, I've placed it in this branch​:

smoke-me/jkeenan/hv/134172-hack

Thank you very much.

Smoke results are looking good. I intend to merge this branch into blead within 24 hours unless we get bad results or someone objects.

TonyC, does this look okay to you?

Hugo, would you like to suggest a better commit message?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2019

From @hvds

On Tue, 11 Jun 2019 05​:59​:02 -0700, jkeenan wrote​:

Smoke results are looking good. I intend to merge this branch into
blead within 24 hours unless we get bad results or someone objects.

TonyC, does this look okay to you?

Hugo, would you like to suggest a better commit message?

I don't think this should be merged at is, it was intended only as a proof of concept.

In particular, I strongly dislike the additional 5 lines (in a NOTREACHED area) simply to satisfy the macro's expectations of balance​: if we keep the current general structure, the solution here will also need to be wrapped around any other part of the code that can bypass the unlock, and I anticipate there will be quite a few such areas.

I've looked some more today, and I think a larger change would be merited, but I'd want some input from DaveM and Karl.

I feel the problems here are essentially a clash between the work from bc37e90​: "Perl_sv_vcatpvfn_flags​: set locale at most once" and from e9bc6d6​: "Add thread-safe locale handling". Because of the mutex requirements of the latter, attempting to retain the former will need us to identify every possible route out of sv_vcatpvfn - not only the explicit croaks, but also eg any attempt to read an SV that might be tied or overloaded.

I don't think that's a practical thing to do; as such, I'm not convinced there's a practical route to getting correct mutex-protected locale changes on the platforms that need it while retaining the "do this only once" performance benefits.

What I'm not sure of is whether _some_ of the "only once" benefits can be retained, eg to discover if switching is necessary - I suspect that subsidiary code (such as tied and overloaded arguments, again) might make that unsafe even in an unthreaded application.

I guess another possibility would be to put an unlock request on a save stack, but that feels like it's going even further in the wrong direction - you want to be shortening the time you hold a mutex rather than lengthening it.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2019

From @jkeenan

On Thu, 06 Jun 2019 15​:54​:29 GMT, jkeenan wrote​:

On Thu, 06 Jun 2019 15​:47​:02 GMT, hv wrote​:

On Thu, 06 Jun 2019 08​:16​:29 -0700, jkeenan wrote​:

(gdb) next
13092 Perl_croak(aTHX_ "Numeric format result
too
large");
2​: *PL_locale_mutex = {m_lock = {m_owner = 0, m_flags = 0,
m_ceilings
= 0x801e1c1e8, m_rb_lnk = 0,
m_spare = 0x801e1c1f8}, m_flags = 1, m_count = 0, m_spinloops

0,
m_yieldloops = 0, m_ps = 0,
m_qe = {tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next =
0x0,
tqe_prev = 0x0}, m_rb_prev = 0x0}
1​: PL_locale_mutex = 0x801e1c1e0

Ok, so at this point it still looks valid ...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000597ad9 in Perl_dounwind (my_perl=0x801e22000, cxix=-
1)
at
pp_ctl.c​:1550
1550 CX_LEAVE_SCOPE(cx);
2​: *PL_locale_mutex = Error accessing memory address 0x2​: Bad
address.
Disabling display 2 to avoid infinite recursion.

.. but by the time we get here it has been corrupted to 0x2.

I think it's worth another go at the 'watch', but if that fails we
may
have to step through with 'next' and 'step' until the displayed value
changes. If we set the watchpoint later, it may work better​:

shell% gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
(gdb) break sv.c​:13092
(gdb) run
.. break at sv.c​:13092 (the croak call)
(gdb) watch PL_locale_mutex
(gdb) cont

As before, it should not take more than a few minutes after the
'cont'.

Hugo

'cont' returned within 1 second.

#####
$ gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) break sv.c​:13092
Breakpoint 1 at 0x55b2f0​: file sv.c, line 13092.
(gdb) run
Starting program​: /usr/home/jkeenan/gitwork/perl/perl -we my\ \$x\ =\
sprintf\(\"%7000000000E\",\ 0\)

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>,
sv=<value optimized out>,
pat=<value optimized out>, patlen=<value optimized out>, args=0x0,
svargs=0x801e16528,
sv_count=1, maybe_tainted=0x801c02200, flags=0) at sv.c​:13092
13092 Perl_croak(aTHX_ "Numeric format result too
large");
(gdb) watch PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
(gdb) cont
Continuing.

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>,
sv=<value optimized out>,
pat=<value optimized out>, patlen=<value optimized out>, args=0x0,
svargs=0x801e16530,
sv_count=1, maybe_tainted=0x452591, flags=0) at sv.c​:13092
13092 Perl_croak(aTHX_ "Numeric format result too
large");
(gdb) bt
#0 Perl_sv_vcatpvfn_flags (my_perl=<value optimized out>, sv=<value
optimized out>,
pat=<value optimized out>, patlen=<value optimized out>, args=0x0,
svargs=0x801e16530,
sv_count=1, maybe_tainted=0x452591, flags=0) at sv.c​:13092
#1 0x0000000000553045 in Perl_sv_vsetpvfn (my_perl=0x801e22000,
sv=0x801e98610,
pat=0x801e20128 "%7000000000E", patlen=<value optimized out>,
args=<value optimized out>,
svargs=0x801e16530, sv_count=1, maybe_tainted=0x7fffffffe577) at
sv.c​:10984
#2 0x00000000005c4f31 in Perl_do_sprintf (my_perl=0x801e22000,
sv=0x801e98610,
len=<value optimized out>, sarg=<value optimized out>) at
doop.c​:734
#3 0x0000000000579ac0 in Perl_pp_sprintf (my_perl=0x801e22000) at
pp.c​:3559
#4 0x00000000004f0340 in Perl_runops_debug (my_perl=0x801e22000) at
dump.c​:2537
#5 0x0000000000453464 in S_run_body (my_perl=0x801e22000, oldscope=1)
at inline.h​:65
#6 0x00000000004532d9 in perl_run (my_perl=0x801e22000) at
perl.c​:2646
#7 0x00000000004213fa in main (argc=<value optimized out>,
argv=<value optimized out>,
env=0x7fffffffe7a0) at perlmain.c​:127
#####

We're getting the same test failure on threaded builds on OpenBSD-6.4 and 6.5.

On 6.5, this is not showing up in the regular smoke-testing reports, but if you go to the "log" version of Carlos's reports, you can see it. See, e.g., http​://perl5.test-smoke.org/logfile/89510, where t/op/sprintf2.t is graded '?????' on threaded builds.

I build a threaded perl on blead on an OpenBSD-6.4 VM. Some results for t/op/sprintf2.t and for Hugo's one-liner.

#####
$ uname -mrs
OpenBSD 6.4 amd64

$ ./perl -Ilib -V​:config_args
config_args='-des -Dusedevel -Duseithreads';

$ ./perl -Ilib -v | head -2 | tail -1
This is perl 5, version 31, subversion 1 (v5.31.1 (v5.31.0-159-g84ab3b6908)) built for OpenBSD.amd64-openbsd-thread-multi

$ cd t;./perl harness -v op/sprintf2.t;cd -
[snip]
ok 1699 - croak for very large numeric format results
pthread_mutex_destroy on mutex with waiters!
All 1699 subtests passed
  (less 30 skipped subtests​: 1669 okay)

Test Summary Report


op/sprintf2.t (Wstat​: 138 Tests​: 1699 Failed​: 0)
  Non-zero wait status​: 138
Files=1, Tests=1699, 1 wallclock secs ( 0.22 usr 0.15 sys + 0.11 cusr 0.53 csys = 1.01 CPU)
Result​: FAIL
#####

Note how on this platform an error message is printed​:

pthread_mutex_destroy on mutex with waiters!

#####
$ uname -mrs
OpenBSD 6.4 amd64

$ ./perl -Ilib -v | head -2 | tail -1
This is perl 5, version 31, subversion 1 (v5.31.1 (v5.31.0-159-g84ab3b6908)) built for OpenBSD.amd64-openbsd-thread-multi

$ ./perl -Ilib -V​:config_args
config_args='-des -Dusedevel -Duseithreads -DDEBUGGING';

$ gdb --args ./perl -we 'my $x = sprintf("%7000000000E", 0)'
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-unknown-openbsd6.4"...
(gdb) break sv.c​:13092
Breakpoint 1 at 0x15aa82​: file sv.c, line 13092.
(gdb) run
Starting program​: /home/jkeenan/gitwork/perl/perl -we my\ \$x\ =\ sprintf\(\"%7000000000E\",\ 0\)
Breakpoint 1 at 0x18a24b55aa82​: file sv.c, line 13092.

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=Variable "my_perl" is not available.
) at sv.c​:13095
13095 Perl_croak(aTHX_ "Numeric format result too large");
(gdb) watch PL_locale_mutex
Watchpoint 2​: PL_locale_mutex
(gdb) cont
Continuing.

Breakpoint 1, Perl_sv_vcatpvfn_flags (my_perl=Variable "my_perl" is not available.
) at sv.c​:13095
13095 Perl_croak(aTHX_ "Numeric format result too large");
(gdb) cont
Continuing.
Numeric format result too large at -e line 1.
Watchpoint 2​: PL_locale_mutex

Old value = 0x18a52d7d7160
New value = 0x0
0x000018a50c654f06 in _libc_pthread_mutex_destroy (mutexp=0x18a24ba98c40)
  at /usr/src/lib/libc/thread/rthread_mutex.c​:89
89 /usr/src/lib/libc/thread/rthread_mutex.c​: No such file or directory.
  in /usr/src/lib/libc/thread/rthread_mutex.c
Current language​: auto; currently minimal
(gdb) bt
#0 0x000018a50c654f06 in _libc_pthread_mutex_destroy (mutexp=0x18a24ba98c40)
  at /usr/src/lib/libc/thread/rthread_mutex.c​:89
#1 0x000018a24b430af5 in Perl_sys_term () at perl.c​:146
#2 0x000018a24b4015ee in main (argc=Variable "argc" is not available.
) at perlmain.c​:155
#####

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 20, 2019

From @iabyn

On Tue, Jun 11, 2019 at 06​:30​:21AM -0700, Hugo van der Sanden via RT wrote​:

I've looked some more today, and I think a larger change would be
merited, but I'd want some input from DaveM and Karl.

I feel the problems here are essentially a clash between the work from
bc37e90​: "Perl_sv_vcatpvfn_flags​: set locale at most once" and from
e9bc6d6​: "Add thread-safe locale handling". Because of the mutex
requirements of the latter, attempting to retain the former will need us
to identify every possible route out of sv_vcatpvfn - not only the
explicit croaks, but also eg any attempt to read an SV that might be
tied or overloaded.

I don't think that's a practical thing to do; as such, I'm not convinced
there's a practical route to getting correct mutex-protected locale
changes on the platforms that need it while retaining the "do this only
once" performance benefits.

I have no objection to making it potentially switch locales for each
call to snprintf.

What I'm not sure of is whether _some_ of the "only once" benefits can
be retained, eg to discover if switching is necessary - I suspect that
subsidiary code (such as tied and overloaded arguments, again) might
make that unsafe even in an unthreaded application.

From my commit message in bc37e90, it appears that checking whether
we're within the lexical scope of 'use locale' is the expensive bit. That
can't change during the course of executing the function, so it looks like
a good candidate for checking only once. I guess it's the IN_LC() bit of
STORE_LC_NUMERIC_SET_TO_NEEDED.

--
Spock (or Data) is fired from his high-ranking position for not being able
to understand the most basic nuances of about one in three sentences that
anyone says to him.
  -- Things That Never Happen in "Star Trek" #19

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @hvds

On Tue, 11 Jun 2019 06​:30​:21 -0700, hv wrote​:

I feel the problems here are essentially a clash between the work from
bc37e90​: "Perl_sv_vcatpvfn_flags​: set locale at most once" and from
e9bc6d6​: "Add thread-safe locale handling". Because of the mutex
requirements of the latter, attempting to retain the former will need
us to identify every possible route out of sv_vcatpvfn - not only the
explicit croaks, but also eg any attempt to read an SV that might be
tied or overloaded.

I don't think that's a practical thing to do; as such, I'm not
convinced there's a practical route to getting correct mutex-protected
locale changes on the platforms that need it while retaining the "do
this only once" performance benefits.

Attached is an attempt to fix this by defining a new more tightly scoped
WITH_LC_NUMERIC_SET_TO_NEEDED macro, and using this around each point
(that I was able to identify) that needs the locale switching.

As Dave mentioned, the expensive part is the hints check of IN_LC(LC_NUMERIC);
I cache a copy of that for sv_vcatpvfn_flags (and format_hexfp) to use;
however I note that the same check is done multiple times within the
underlying STORE_LC_NUMERIC_SET_TO_NEEDED macro, so it may be worth
making variants of WITH_LC_x and STORE_LC_x that accept a cached boolean
for the hints check.

I'd appreciate if someone could look through for any location I might
have missed that needs the new macro, and we should probably give this
some smoking.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @hvds

0001-perl-134172-restrict-scope-of-locale-changes-during-.patch
From 8ced26e934d146385433fe888a9111f611f76d3d Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <[email protected]>
Date: Mon, 22 Jul 2019 16:29:13 +0100
Subject: [PATCH] [perl #134172] restrict scope of locale changes during
 sprintf

In some environments we must hold a mutex for the duration of a temporary
locale change, so we must ensure that mutex is released appropriately.
This means intervening code must not croak, or otherwise bypass the
unlock.

In sv_vcatpvfn_flags(), that requirement was violated when attempting
to avoid multiple temporary locale changes by collapsing them into
a single one. This partially undoes that to fix the problem, while
still attempting to retain some of the benefits by caching the
expensive hints check.
---
 perl.h |  32 +++++++++++++++++
 sv.c   | 111 +++++++++++++++++++++++++++++----------------------------
 2 files changed, 88 insertions(+), 55 deletions(-)

diff --git a/perl.h b/perl.h
index 37b2637ca1..3be4a53756 100644
--- a/perl.h
+++ b/perl.h
@@ -6426,6 +6426,29 @@ expression, but with an empty argument list, like this:
      ...
  }
 
+=for apidoc Amn|void|WITH_LC_NUMERIC_SET_TO_NEEDED
+
+This macro invokes the supplied statement or block within the context
+of a L</STORE_LC_NUMERIC_SET_TO_NEEDED> .. L</RESTORE_LC_NUMERIC> pair
+if required, so eg:
+
+  WITH_LC_NUMERIC_SET_TO_NEEDED(
+    SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis)
+  );
+
+is equivalent to:
+
+  {
+#ifdef USE_LOCALE_NUMERIC
+    DECLARATION_FOR_LC_NUMERIC_MANIPULATION;
+    STORE_LC_NUMERIC_SET_TO_NEEDED();
+#endif
+    SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis);
+#ifdef USE_LOCALE_NUMERIC
+    RESTORE_LC_NUMERIC();
+#endif
+  }
+
 =cut
 
 */
@@ -6548,6 +6571,14 @@ expression, but with an empty argument list, like this:
             __FILE__, __LINE__, PL_numeric_standard));                      \
         } STMT_END
 
+#  define WITH_LC_NUMERIC_SET_TO_NEEDED(block)                              \
+        STMT_START {                                                        \
+            DECLARATION_FOR_LC_NUMERIC_MANIPULATION;                        \
+            STORE_LC_NUMERIC_SET_TO_NEEDED();                               \
+            block;                                                          \
+            RESTORE_LC_NUMERIC();                                           \
+        } STMT_END;
+
 #else /* !USE_LOCALE_NUMERIC */
 
 #  define SET_NUMERIC_STANDARD()
@@ -6560,6 +6591,7 @@ expression, but with an empty argument list, like this:
 #  define RESTORE_LC_NUMERIC()
 #  define LOCK_LC_NUMERIC_STANDARD()
 #  define UNLOCK_LC_NUMERIC_STANDARD()
+#  define WITH_LC_NUMERIC_SET_TO_NEEDED(block) block;
 
 #endif /* !USE_LOCALE_NUMERIC */
 
diff --git a/sv.c b/sv.c
index 4315fe9b64..ef2c71126c 100644
--- a/sv.c
+++ b/sv.c
@@ -11562,7 +11562,9 @@ S_hextract(pTHX_ const NV nv, int* exponent, bool *subnormal,
  * The rest of the args have the same meaning as the local vars of the
  * same name within Perl_sv_vcatpvfn_flags().
  *
- * It assumes the caller has already done STORE_LC_NUMERIC_SET_TO_NEEDED();
+ * The caller's determination of IN_LC(LC_NUMERIC), passed as in_lc_numeric,
+ * is used to ensure we do the right thing when we need to access the locale's
+ * numeric radix.
  *
  * It requires the caller to make buf large enough.
  */
@@ -11571,7 +11573,7 @@ static STRLEN
 S_format_hexfp(pTHX_ char * const buf, const STRLEN bufsize, const char c,
                     const NV nv, const vcatpvfn_long_double_t fv,
                     bool has_precis, STRLEN precis, STRLEN width,
-                    bool alt, char plus, bool left, bool fill)
+                    bool alt, char plus, bool left, bool fill, bool in_lc_numeric)
 {
     /* Hexadecimal floating point. */
     char* p = buf;
@@ -11778,17 +11780,19 @@ S_format_hexfp(pTHX_ char * const buf, const STRLEN bufsize, const char c,
 
     if (hexradix) {
 #ifndef USE_LOCALE_NUMERIC
-            *p++ = '.';
+        *p++ = '.';
 #else
-            if (IN_LC(LC_NUMERIC)) {
-                STRLEN n;
+        if (in_lc_numeric) {
+            STRLEN n;
+            WITH_LC_NUMERIC_SET_TO_NEEDED({
                 const char* r = SvPV(PL_numeric_radix_sv, n);
                 Copy(r, p, n, char);
-                p += n;
-            }
-            else {
-                *p++ = '.';
-            }
+            });
+            p += n;
+        }
+        else {
+            *p++ = '.';
+        }
 #endif
     }
 
@@ -11894,9 +11898,10 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
     char ebuf[IV_DIG * 4 + NV_DIG + 32];
     bool no_redundant_warning = FALSE; /* did we use any explicit format parameter index? */
 #ifdef USE_LOCALE_NUMERIC
-    DECLARATION_FOR_LC_NUMERIC_MANIPULATION;
-    bool lc_numeric_set = FALSE; /* called STORE_LC_NUMERIC_SET_TO_NEEDED? */
+    bool have_in_lc_numeric = FALSE;
 #endif
+    /* we never change this unless USE_LOCALE_NUMERIC */
+    bool in_lc_numeric = FALSE;
 
     PERL_ARGS_ASSERT_SV_VCATPVFN_FLAGS;
     PERL_UNUSED_ARG(maybe_tainted);
@@ -12967,33 +12972,31 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
              * below, or implicitly, via an snprintf() variant.
              * Note also things like ps_AF.utf8 which has
              * "\N{ARABIC DECIMAL SEPARATOR} as a radix point */
-            if (!lc_numeric_set) {
-                /* only set once and reuse in-locale value on subsequent
-                 * iterations.
-                 * XXX what happens if we die in an eval?
-                 */
-                STORE_LC_NUMERIC_SET_TO_NEEDED();
-                lc_numeric_set = TRUE;
+            if (! have_in_lc_numeric) {
+                in_lc_numeric = IN_LC(LC_NUMERIC);
+                have_in_lc_numeric = TRUE;
             }
 
-            if (IN_LC(LC_NUMERIC)) {
-                /* this can't wrap unless PL_numeric_radix_sv is a string
-                 * consuming virtually all the 32-bit or 64-bit address
-                 * space
-                 */
-                float_need += (SvCUR(PL_numeric_radix_sv) - 1);
-
-                /* floating-point formats only get utf8 if the radix point
-                 * is utf8. All other characters in the string are < 128
-                 * and so can be safely appended to both a non-utf8 and utf8
-                 * string as-is.
-                 * Note that this will convert the output to utf8 even if
-                 * the radix point didn't get output.
-                 */
-                if (SvUTF8(PL_numeric_radix_sv) && !has_utf8) {
-                    sv_utf8_upgrade(sv);
-                    has_utf8 = TRUE;
-                }
+            if (in_lc_numeric) {
+                WITH_LC_NUMERIC_SET_TO_NEEDED({
+                    /* this can't wrap unless PL_numeric_radix_sv is a string
+                     * consuming virtually all the 32-bit or 64-bit address
+                     * space
+                     */
+                    float_need += (SvCUR(PL_numeric_radix_sv) - 1);
+
+                    /* floating-point formats only get utf8 if the radix point
+                     * is utf8. All other characters in the string are < 128
+                     * and so can be safely appended to both a non-utf8 and utf8
+                     * string as-is.
+                     * Note that this will convert the output to utf8 even if
+                     * the radix point didn't get output.
+                     */
+                    if (SvUTF8(PL_numeric_radix_sv) && !has_utf8) {
+                        sv_utf8_upgrade(sv);
+                        has_utf8 = TRUE;
+                    }
+                });
             }
 #endif
 
@@ -13068,7 +13071,9 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                 && !fill
                 && intsize != 'q'
             ) {
-                SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis);
+                WITH_LC_NUMERIC_SET_TO_NEEDED(
+                    SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis)
+                );
                 elen = strlen(ebuf);
                 eptr = ebuf;
                 goto float_concat;
@@ -13113,7 +13118,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
             if (UNLIKELY(hexfp)) {
                 elen = S_format_hexfp(aTHX_ PL_efloatbuf, PL_efloatsize, c,
                                 nv, fv, has_precis, precis, width,
-                                alt, plus, left, fill);
+                                alt, plus, left, fill, in_lc_numeric);
             }
             else {
                 char *ptr = ebuf + sizeof ebuf;
@@ -13169,8 +13174,10 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                     const char* qfmt = quadmath_format_single(ptr);
                     if (!qfmt)
                         Perl_croak_nocontext("panic: quadmath invalid format \"%s\"", ptr);
-                    elen = quadmath_snprintf(PL_efloatbuf, PL_efloatsize,
-                                             qfmt, nv);
+                    WITH_LC_NUMERIC_SET_TO_NEEDED(
+                        elen = quadmath_snprintf(PL_efloatbuf, PL_efloatsize,
+                                                 qfmt, nv);
+                    );
                     if ((IV)elen == -1) {
                         if (qfmt != ptr)
                             SAVEFREEPV(qfmt);
@@ -13180,11 +13187,15 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                         Safefree(qfmt);
                 }
 #elif defined(HAS_LONG_DOUBLE)
-                elen = ((intsize == 'q')
-                        ? my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv)
-                        : my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, (double)fv));
+                WITH_LC_NUMERIC_SET_TO_NEEDED(
+                    elen = ((intsize == 'q')
+                            ? my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv)
+                            : my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, (double)fv))
+                );
 #else
-                elen = my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv);
+                WITH_LC_NUMERIC_SET_TO_NEEDED(
+                    elen = my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv)
+                );
 #endif
                 GCC_DIAG_RESTORE_STMT;
 	    }
@@ -13406,16 +13417,6 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
     }
 
     SvTAINT(sv);
-
-#ifdef USE_LOCALE_NUMERIC
-
-    if (lc_numeric_set) {
-        RESTORE_LC_NUMERIC();   /* Done outside loop, so don't have to
-                                   save/restore each iteration. */
-    }
-
-#endif
-
 }
 
 /* =========================================================================
-- 
2.17.1

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @hvds

On Mon, 22 Jul 2019 08​:47​:39 -0700, hv wrote​:

As Dave mentioned, the expensive part is the hints check of IN_LC(LC_NUMERIC);
I cache a copy of that for sv_vcatpvfn_flags (and format_hexfp) to use;
however I note that the same check is done multiple times within the
underlying STORE_LC_NUMERIC_SET_TO_NEEDED macro, so it may be worth
making variants of WITH_LC_x and STORE_LC_x that accept a cached boolean
for the hints check.

Something like the attached (not extensively tested).

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @hvds

0002-Avoid-multiple-checks-of-IN_LC-LC_NUMERIC.patch
From 025dca8448365565bb5121614ae17bd7b726642a Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <[email protected]>
Date: Mon, 22 Jul 2019 17:08:45 +0100
Subject: [PATCH] Avoid multiple checks of IN_LC(LC_NUMERIC)

---
 perl.h | 20 ++++++++++++++------
 sv.c   | 12 ++++++------
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/perl.h b/perl.h
index 3be4a53756..c4afb3f2c7 100644
--- a/perl.h
+++ b/perl.h
@@ -6476,12 +6476,12 @@ is equivalent to:
 #  define DECLARATION_FOR_LC_NUMERIC_MANIPULATION                           \
     void (*_restore_LC_NUMERIC_function)(pTHX) = NULL
 
-#  define STORE_LC_NUMERIC_SET_TO_NEEDED()                                  \
+#  define STORE_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric)                   \
         STMT_START {                                                        \
             LC_NUMERIC_LOCK(                                                \
-                    (   (  IN_LC(LC_NUMERIC) && _NOT_IN_NUMERIC_UNDERLYING) \
-                     || (! IN_LC(LC_NUMERIC) && _NOT_IN_NUMERIC_STANDARD)));\
-            if (IN_LC(LC_NUMERIC)) {                                        \
+                    (   (  in_lc_numeric && _NOT_IN_NUMERIC_UNDERLYING)     \
+                     || (! in_lc_numeric && _NOT_IN_NUMERIC_STANDARD)));    \
+            if (in_lc_numeric) {                                            \
                 if (_NOT_IN_NUMERIC_UNDERLYING) {                           \
                     Perl_set_numeric_underlying(aTHX);                      \
                     _restore_LC_NUMERIC_function                            \
@@ -6497,6 +6497,9 @@ is equivalent to:
             }                                                               \
         } STMT_END
 
+#  define STORE_LC_NUMERIC_SET_TO_NEEDED() \
+        STORE_LC_NUMERIC_SET_TO_NEEDED_i(IN_LC(LC_NUMERIC))
+
 #  define RESTORE_LC_NUMERIC()                                              \
         STMT_START {                                                        \
             if (_restore_LC_NUMERIC_function) {                             \
@@ -6571,14 +6574,17 @@ is equivalent to:
             __FILE__, __LINE__, PL_numeric_standard));                      \
         } STMT_END
 
-#  define WITH_LC_NUMERIC_SET_TO_NEEDED(block)                              \
+#  define WITH_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric, block)             \
         STMT_START {                                                        \
             DECLARATION_FOR_LC_NUMERIC_MANIPULATION;                        \
-            STORE_LC_NUMERIC_SET_TO_NEEDED();                               \
+            STORE_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric);                \
             block;                                                          \
             RESTORE_LC_NUMERIC();                                           \
         } STMT_END;
 
+#  define WITH_LC_NUMERIC_SET_TO_NEEDED(block) \
+        WITH_LC_NUMERIC_SET_TO_NEEDED_i(IN_LC(LC_NUMERIC), block)
+
 #else /* !USE_LOCALE_NUMERIC */
 
 #  define SET_NUMERIC_STANDARD()
@@ -6587,10 +6593,12 @@ is equivalent to:
 #  define DECLARATION_FOR_LC_NUMERIC_MANIPULATION
 #  define STORE_LC_NUMERIC_SET_STANDARD()
 #  define STORE_LC_NUMERIC_FORCE_TO_UNDERLYING()
+#  define STORE_LC_NUMERIC_SET_TO_NEEDED_i()
 #  define STORE_LC_NUMERIC_SET_TO_NEEDED()
 #  define RESTORE_LC_NUMERIC()
 #  define LOCK_LC_NUMERIC_STANDARD()
 #  define UNLOCK_LC_NUMERIC_STANDARD()
+#  define WITH_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric, block) block;
 #  define WITH_LC_NUMERIC_SET_TO_NEEDED(block) block;
 
 #endif /* !USE_LOCALE_NUMERIC */
diff --git a/sv.c b/sv.c
index ef2c71126c..dfe5f88e52 100644
--- a/sv.c
+++ b/sv.c
@@ -11784,7 +11784,7 @@ S_format_hexfp(pTHX_ char * const buf, const STRLEN bufsize, const char c,
 #else
         if (in_lc_numeric) {
             STRLEN n;
-            WITH_LC_NUMERIC_SET_TO_NEEDED({
+            WITH_LC_NUMERIC_SET_TO_NEEDED_i(TRUE, {
                 const char* r = SvPV(PL_numeric_radix_sv, n);
                 Copy(r, p, n, char);
             });
@@ -12978,7 +12978,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
             }
 
             if (in_lc_numeric) {
-                WITH_LC_NUMERIC_SET_TO_NEEDED({
+                WITH_LC_NUMERIC_SET_TO_NEEDED_i(TRUE, {
                     /* this can't wrap unless PL_numeric_radix_sv is a string
                      * consuming virtually all the 32-bit or 64-bit address
                      * space
@@ -13071,7 +13071,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                 && !fill
                 && intsize != 'q'
             ) {
-                WITH_LC_NUMERIC_SET_TO_NEEDED(
+                WITH_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric,
                     SNPRINTF_G(fv, ebuf, sizeof(ebuf), precis)
                 );
                 elen = strlen(ebuf);
@@ -13174,7 +13174,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                     const char* qfmt = quadmath_format_single(ptr);
                     if (!qfmt)
                         Perl_croak_nocontext("panic: quadmath invalid format \"%s\"", ptr);
-                    WITH_LC_NUMERIC_SET_TO_NEEDED(
+                    WITH_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric,
                         elen = quadmath_snprintf(PL_efloatbuf, PL_efloatsize,
                                                  qfmt, nv);
                     );
@@ -13187,13 +13187,13 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
                         Safefree(qfmt);
                 }
 #elif defined(HAS_LONG_DOUBLE)
-                WITH_LC_NUMERIC_SET_TO_NEEDED(
+                WITH_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric,
                     elen = ((intsize == 'q')
                             ? my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv)
                             : my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, (double)fv))
                 );
 #else
-                WITH_LC_NUMERIC_SET_TO_NEEDED(
+                WITH_LC_NUMERIC_SET_TO_NEEDED_i(in_lc_numeric,
                     elen = my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv)
                 );
 #endif
-- 
2.17.1

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @jkeenan

On Mon, 22 Jul 2019 16​:11​:13 GMT, hv wrote​:

On Mon, 22 Jul 2019 08​:47​:39 -0700, hv wrote​:

As Dave mentioned, the expensive part is the hints check of
IN_LC(LC_NUMERIC);
I cache a copy of that for sv_vcatpvfn_flags (and format_hexfp) to
use;
however I note that the same check is done multiple times within the
underlying STORE_LC_NUMERIC_SET_TO_NEEDED macro, so it may be worth
making variants of WITH_LC_x and STORE_LC_x that accept a cached
boolean
for the hints check.

Something like the attached (not extensively tested).

Hugo

Hugo, I created this smoke branch just now after seeing our *first* patch​:

smoke-me/jkeenan/hv/134172-sprintf

Should this second patch be applied to that branch as well?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @hvds

On Mon, 22 Jul 2019 09​:15​:20 -0700, jkeenan wrote​:

On Mon, 22 Jul 2019 16​:11​:13 GMT, hv wrote​:

On Mon, 22 Jul 2019 08​:47​:39 -0700, hv wrote​:

As Dave mentioned, the expensive part is the hints check of
IN_LC(LC_NUMERIC);
I cache a copy of that for sv_vcatpvfn_flags (and format_hexfp) to
use;
however I note that the same check is done multiple times within the
underlying STORE_LC_NUMERIC_SET_TO_NEEDED macro, so it may be worth
making variants of WITH_LC_x and STORE_LC_x that accept a cached
boolean
for the hints check.

Something like the attached (not extensively tested).

Hugo, I created this smoke branch just now after seeing our *first* patch​:

smoke-me/jkeenan/hv/134172-sprintf

Should this second patch be applied to that branch as well?

Up to you​: I'd like a second opinion on it, particularly about the naming of the additional macros, but I think it's reasonably likely to work. (It passed tests here on a build with -DNO_THREAD_SAFE_LOCALE, but I haven't tried any other variations.)

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2019

From @jkeenan

On Mon, 22 Jul 2019 18​:14​:11 GMT, hv wrote​:

On Mon, 22 Jul 2019 09​:15​:20 -0700, jkeenan wrote​:

On Mon, 22 Jul 2019 16​:11​:13 GMT, hv wrote​:

On Mon, 22 Jul 2019 08​:47​:39 -0700, hv wrote​:

As Dave mentioned, the expensive part is the hints check of
IN_LC(LC_NUMERIC);
I cache a copy of that for sv_vcatpvfn_flags (and format_hexfp)
to
use;
however I note that the same check is done multiple times within
the
underlying STORE_LC_NUMERIC_SET_TO_NEEDED macro, so it may be
worth
making variants of WITH_LC_x and STORE_LC_x that accept a cached
boolean
for the hints check.

Something like the attached (not extensively tested).

Hugo, I created this smoke branch just now after seeing our *first*
patch​:

smoke-me/jkeenan/hv/134172-sprintf

Should this second patch be applied to that branch as well?

Up to you​: I'd like a second opinion on it, particularly about the
naming of the additional macros, but I think it's reasonably likely to
work. (It passed tests here on a build with -DNO_THREAD_SAFE_LOCALE,
but I haven't tried any other variations.)

Hugo

Applying to branch for smoke testing.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2019

From @jkeenan

On Mon, 22 Jul 2019 20​:06​:30 GMT, jkeenan wrote​:

On Mon, 22 Jul 2019 18​:14​:11 GMT, hv wrote​:

On Mon, 22 Jul 2019 09​:15​:20 -0700, jkeenan wrote​:

On Mon, 22 Jul 2019 16​:11​:13 GMT, hv wrote​:

On Mon, 22 Jul 2019 08​:47​:39 -0700, hv wrote​:

As Dave mentioned, the expensive part is the hints check of
IN_LC(LC_NUMERIC);
I cache a copy of that for sv_vcatpvfn_flags (and format_hexfp)
to
use;
however I note that the same check is done multiple times within
the
underlying STORE_LC_NUMERIC_SET_TO_NEEDED macro, so it may be
worth
making variants of WITH_LC_x and STORE_LC_x that accept a cached
boolean
for the hints check.

Something like the attached (not extensively tested).

Hugo, I created this smoke branch just now after seeing our *first*
patch​:

smoke-me/jkeenan/hv/134172-sprintf

Should this second patch be applied to that branch as well?

Up to you​: I'd like a second opinion on it, particularly about the
naming of the additional macros, but I think it's reasonably likely to
work. (It passed tests here on a build with -DNO_THREAD_SAFE_LOCALE,
but I haven't tried any other variations.)

Hugo

Applying to branch for smoke testing.

Smoke-test results in the branch appear satisfactory.

http​://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Fhv%2F134172-sprintf

What else do we have to decide before merging this branch and closing this ticket?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2019

From @hvds

On Wed, 24 Jul 2019 14​:18​:01 -0700, jkeenan wrote​:

On Mon, 22 Jul 2019 20​:06​:30 GMT, jkeenan wrote​:

Applying to branch for smoke testing.

Smoke-test results in the branch appear satisfactory.

http​://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Fhv%2F134172-
sprintf

What else do we have to decide before merging this branch and closing
this ticket?

I'd like to get some feedback, particularly from Karl and Dave; Karl has already indicated he'll try to take a look this weekend. It also looks like Tony has been looking into the same issue, I just noticed he has created a branch tonyc/134172-in-lc which may have duplicated some of the work.

Given clean smokes, I'm already pretty comfortable with the first patch; the second patch ("Avoid multiple checks of IN_LC(LC_NUMERIC)") probably needs better documentation, possibly better naming, and due consideration whether to expose the new macros as part of the API.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2019

From @tonycoz

On Wed, Jul 24, 2019 at 03​:07​:48PM -0700, Hugo van der Sanden via RT wrote​:

On Wed, 24 Jul 2019 14​:18​:01 -0700, jkeenan wrote​:

On Mon, 22 Jul 2019 20​:06​:30 GMT, jkeenan wrote​:

Applying to branch for smoke testing.

Smoke-test results in the branch appear satisfactory.

http​://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Fhv%2F134172-
sprintf

What else do we have to decide before merging this branch and closing
this ticket?

I'd like to get some feedback, particularly from Karl and Dave; Karl has already indicated he'll try to take a look this weekend. It also looks like Tony has been looking into the same issue, I just noticed he has created a branch tonyc/134172-in-lc which may have duplicated some of the work.

Given clean smokes, I'm already pretty comfortable with the first patch; the second patch ("Avoid multiple checks of IN_LC(LC_NUMERIC)") probably needs better documentation, possibly better naming, and due consideration whether to expose the new macros as part of the API.

Hugo

It looks reasonable to me.

One thing I noticed in the original code was that
STORE_LC_NUMERIC_SET_TO_NEEDED() could call the possibly expensive
IN_LC(LC_NUMERIC) up to three times​:

+ ( ( in_lc_numeric && _NOT_IN_NUMERIC_UNDERLYING) \
+ || (! in_lc_numeric && _NOT_IN_NUMERIC_STANDARD))); \
+ if (in_lc_numeric) { \

(for a bare STORE_LC_NUMERIC_SET_TO_NEEDED() in_lc_numeric is
IN_LC(LC_NUMERIC) )

Tony

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2019

From @iabyn

On Wed, Jul 24, 2019 at 03​:07​:48PM -0700, Hugo van der Sanden via RT wrote​:

I'd like to get some feedback, particularly from Karl and Dave;

I've been dead to the world with flu for the last 8 days, so you probably
won't hear from me soon.

--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
  -- Larry Wall

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2019

From @hvds

On Wed, 24 Jul 2019 17​:38​:13 -0700, tonyc wrote​:

It looks reasonable to me.

Thanks.

One thing I noticed in the original code was that
STORE_LC_NUMERIC_SET_TO_NEEDED() could call the possibly expensive
IN_LC(LC_NUMERIC) up to three times​:

+ ( ( in_lc_numeric &&
_NOT_IN_NUMERIC_UNDERLYING) \
+ || (! in_lc_numeric &&
_NOT_IN_NUMERIC_STANDARD))); \
+ if (in_lc_numeric) {
\

(for a bare STORE_LC_NUMERIC_SET_TO_NEEDED() in_lc_numeric is
IN_LC(LC_NUMERIC) )

Yes, I addressed the same in my second patch, in a slightly different manner.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2019

From @khwilliamson

On 7/25/19 8​:17 AM, Hugo van der Sanden via RT wrote​:

On Wed, 24 Jul 2019 17​:38​:13 -0700, tonyc wrote​:

It looks reasonable to me.

Thanks.

One thing I noticed in the original code was that
STORE_LC_NUMERIC_SET_TO_NEEDED() could call the possibly expensive
IN_LC(LC_NUMERIC) up to three times​:

+ ( ( in_lc_numeric &&
_NOT_IN_NUMERIC_UNDERLYING) \
+ || (! in_lc_numeric &&
_NOT_IN_NUMERIC_STANDARD))); \
+ if (in_lc_numeric) {
\

(for a bare STORE_LC_NUMERIC_SET_TO_NEEDED() in_lc_numeric is
IN_LC(LC_NUMERIC) )

Yes, I addressed the same in my second patch, in a slightly different manner.

Hugo

---
via perlbug​: queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=134172

The patches both look good to me.

I have two very minor concerns.

Is saying foo({...}) portable?

And on platforms that don't have locales enabled, is it ok to just say
"block;", or should this be wrapped in a STMT_START STMT_END

As far as the names, I don't know. It's not clear to me what the _i
suffix is supposed to mean, so if you go with that name, it should be
documented. If it is meant to mean 'internal', often using an
underscore prefix is the convention. But it's not really private,, so I
don't know.

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2019

From @hvds

On Mon, 29 Jul 2019 21​:13​:20 -0700, public@​khwilliamson.com wrote​:

The patches both look good to me.

Thanks.

I have two very minor concerns.

Is saying foo({...}) portable?

We do this in many places with DEBUG_foo() type macros.

And on platforms that don't have locales enabled, is it ok to just say
"block;", or should this be wrapped in a STMT_START STMT_END

By analogy with DEBUG_foo, I thought I'd found evidence it was; but I don't see that now, and DEBUG_r (for example) does wrap them, I'll modify this.

As far as the names, I don't know. It's not clear to me what the _i
suffix is supposed to mean, so if you go with that name, it should be
documented. If it is meant to mean 'internal', often using an
underscore prefix is the convention. But it's not really private,, so
I don't know.

I agree, but I didn't have a better name. I see Tony used ..._IN(in) in his closely equivalent version, so I'll go with that.

I plan to commit both patches with those changes, along with some docs for the _IN variants, later today.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2019

From @hvds

Now pushed​:

commit a06a4d4
  [perl #134172] restrict scope of locale changes during sprintf
 
  In some environments we must hold a mutex for the duration of a temporary
  locale change, so we must ensure that mutex is released appropriately.
  This means intervening code must not croak, or otherwise bypass the
  unlock.
 
  In sv_vcatpvfn_flags(), that requirement was violated when attempting
  to avoid multiple temporary locale changes by collapsing them into
  a single one. This partially undoes that to fix the problem, while
  still attempting to retain some of the benefits by caching the
  expensive hints check.

commit 39b0ad1
  [perl #134172] Avoid multiple checks of IN_LC(LC_NUMERIC)
 
  This adds new API macros STORE_LC_NUMERIC_SET_TO_NEEDED_IN and
  WITH_LC_NUMERIC_SET_TO_NEEDED_IN that accept a precalculated value
  for the hints checks of IN_LC(LC_NUMERIC).

commit 061637c
  [perl #134172] perldelta

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2019

@hvds - Status changed from 'open' to 'pending release'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant