Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault at long lines for regexp #17230

Open
prosaole opened this issue Oct 29, 2019 · 10 comments
Open

Segfault at long lines for regexp #17230

prosaole opened this issue Oct 29, 2019 · 10 comments

Comments

@prosaole
Copy link

prosaole commented Oct 29, 2019

This is a bug report for perl from [email protected].
generated with the help of perlbug 1.41 running under perl 5.30.0.


[Please describe your issue here]

This seg faults:

perl -e 'print "\0"x(2**31+50),"a"' |
  ./perl-5.30.0 -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'

I had expected it to not seg fault.

These give very different output:

(seq 10;perl -e 'print "\0"x(2**31-50),"a"') |
  ./perl-5.30.0 -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'
(seq 10;perl -e 'print "\0"x(2**31+50),"a"') |
  ./perl-5.30.0 -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'

I had expected them to give almost the same output.

It seems the regexp engine is unhappy about lines > 2GB.

[Please do not change anything below this line]


Flags:
category=core
severity=medium

Site configuration information for perl 5.30.0:

Configured by tange at Mon Oct 28 22:16:49 CET 2019.

Summary of my perl5 (revision 5 version 30 subversion 0) configuration:

Platform:
osname=linux
osvers=4.15.0-58-generic
archname=x86_64-linux
uname='linux aspire 4.15.0-58-generic #64-ubuntu smp tue aug 6 11:12:41 utc 2019 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Dprefix=/mnt/4tb/home/tange/localperl'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='7.4.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.27.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.27'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


@inc for perl 5.30.0:
/mnt/4tb/home/tange/localperl/lib/site_perl/5.30.0/x86_64-linux
/mnt/4tb/home/tange/localperl/lib/site_perl/5.30.0
/mnt/4tb/home/tange/localperl/lib/5.30.0/x86_64-linux
/mnt/4tb/home/tange/localperl/lib/5.30.0
/mnt/4tb/home/tange/localperl/lib/site_perl/5.24.0
/mnt/4tb/home/tange/localperl/lib/site_perl/5.22.2
/mnt/4tb/home/tange/localperl/lib/site_perl


Environment for perl 5.30.0:
HOME=/mnt/4tb/home/tange
LANG=C
LANGUAGE=C
LC_ALL=en_US.UTF-8
LC_TIME=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=.:/mnt/4tb/home/tange/bin:/mnt/4tb/home/tange/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/sbin:/mnt/4tb/home/tange/.local/bin:/mnt/4tb/home/tange/.cargo/bin:/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin
PERL_BADLANG (unset)
PERL_MB_OPT=--install_base "/home/tange/perl5"
PERL_MM_OPT=INSTALL_BASE=/home/tange/perl5
SHELL=/bin/bash

@tonycoz
Copy link
Contributor

tonycoz commented Nov 5, 2019

With a debugging build I get an assertion without the second s///:

perl: regcomp.c:8692: Perl_reg_numbered_buff_fetch: Assertion `s >= rx->subbeg' failed.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7c23535 in __GI_abort () at abort.c:79
#2  0x00007ffff7c2340f in __assert_fail_base (
    fmt=0x7ffff7d85ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x555555ad5e0d "s >= rx->subbeg", 
    file=0x555555ad2650 "regcomp.c", line=8692, function=<optimized out>)
    at assert.c:92
#3  0x00007ffff7c31102 in __GI___assert_fail (
    assertion=0x555555ad5e0d "s >= rx->subbeg", 
    file=0x555555ad2650 "regcomp.c", line=8692, 
    function=0x555555af5d20 <__PRETTY_FUNCTION__.22728> "Perl_reg_numbered_buff_fetch") at assert.c:101
#4  0x00005555556a49cc in Perl_reg_numbered_buff_fetch (r=0x555555c05cf8, 
    paren=1, sv=0x555555c05d40) at regcomp.c:8692
#5  0x0000555555728031 in Perl_magic_get (sv=0x555555c05d40, mg=0x555555c09ce0)
    at mg.c:919
#6  0x0000555555725ca7 in Perl_mg_get (sv=0x555555c05d40) at mg.c:201
#7  0x0000555555802bf8 in Perl_pp_length () at pp.c:3127
#8  0x0000555555711821 in Perl_runops_debug () at dump.c:2571
#9  0x00005555555edc06 in S_run_body (oldscope=1) at perl.c:2711
#10 0x00005555555ed184 in perl_run (my_perl=0x555555bdf260) at perl.c:2634
#11 0x00005555555a1132 in main (argc=5, argv=0x7fffffffe4a8, 
    env=0x7fffffffe4d8) at perlmain.c:127
(gdb) up 4
#4  0x00005555556a49cc in Perl_reg_numbered_buff_fetch (r=0x555555c05cf8, 
    paren=1, sv=0x555555c05d40) at regcomp.c:8692
8692        assert(s >= rx->subbeg);
(gdb) p s
$1 = 0x7ffe93287042 ""
(gdb) p rx->subbeg
$2 = 0x7fff13287010 ""
(gdb) p n
$4 = 1
(gdb) p rx->suboffset
$5 = 0
(gdb) p s1
$6 = -2147483598
(gdb) p rx->offs[n].start
$7 = -2147483598
(gdb) p rx->offs[n].end
$8 = -2147483597

@khwilliamson
Copy link
Contributor

I get this on 5.35.10
regcomp.c:8986: void Perl_reg_numbered_buff_fetch(PerlInterpreter*, REGEXP*, I32, SV*): Assertion `s >= rx->subbeg' failed.

@prosaole
Copy link
Author

I tested on CygWin perl-5.32.1.

It does not seg fault, but the result is clearly wrong.

$ (seq 10;perl -e 'print "\0"x(2**31+50),"a"') |   perl -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'
2r
29r
4294967297r

$ (seq 10;perl -e 'print "\0"x(2**31-50),"a"') |   perl -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'
2r
29r
2147483588
1r

Looks like a 31-bit overflow. So a workaround might be to simply changing an int to a long: I expect very few will have lines with 2^63 chars.

$ perl -V
Summary of my perl5 (revision 5 version 32 subversion 1) configuration:

  Platform:
    osname=cygwin
    osvers=3.2.0(0.34053)
    archname=x86_64-cygwin-threads-multi
    uname='cygwin_nt-10.0 cygwinpro 3.2.0(0.34053) 2021-03-29 08:42 x86_64 cygwin '
    config_args='-des -Dprefix=/usr -Dmksymlinks -Darchname=x86_64-cygwin-threads -Dlibperl=cygperl5_32.dll -Dcc=gcc -Dld=g++ -Accflags=-ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.32.1-2 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.32.1=/usr/src/debug/perl-5.32.1-2 -fwrapv'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.32.1-2 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.32.1=/usr/src/debug/perl-5.32.1-2 -fwrapv -fno-strict-aliasing'
    optimize='-O3'
    cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.32.1-2 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.32.1=/usr/src/debug/perl-5.32.1-2 -fwrapv -fno-strict-aliasing'
    ccversion=''
    gccversion='10.2.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='g++'
    ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'
    libpth=/usr/lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lcrypt -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lcrypt
    libc=/usr/lib/libcygwin.a
    so=dll
    useshrplib=true
    libperl=cygperl5_32.dll
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' --shared  -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    Cygwin: README
    Cygwin: use auto-image-base instead of fixed DLL base address
    Cygwin: modify hints
    Cygwin: Configure correct libsearch
    Cygwin: Configure correct libpth
    Cygwin: Encode fix for CVE-2021-36770
    Cygwin: Win32 correct UTF8 handling
  Built under cygwin
  Compiled at Aug 14 2021 09:07:12
  @INC:
    /usr/local/lib/perl5/site_perl/5.32/x86_64-cygwin-threads
    /usr/local/share/perl5/site_perl/5.32
    /usr/lib/perl5/vendor_perl/5.32/x86_64-cygwin-threads
    /usr/share/perl5/vendor_perl/5.32
    /usr/lib/perl5/5.32/x86_64-cygwin-threads
    /usr/share/perl5/5.32

@demerphq
Copy link
Collaborator

demerphq commented Apr 2, 2022 via email

@khwilliamson
Copy link
Contributor

This still exists in 5.37.12

@tonycoz
Copy link
Contributor

tonycoz commented Apr 26, 2023

#21012 fixes the crash (or assertion failure when I tried it), but is still limited by the default REG_INFTY:

$ ./perl -e 'print "\0" x (2**31+50),"a"' | ./perl -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'
2147483647
51
1r

Unfortunately a lot of the regexp code assumes that REG_INFTY fits in an I32:

$ ./Configure -des -Dusedevel -DDEBUGGING -Doptimize=-O0\ -ggdb3 -Dusethreads -Accflags=-DREG_INFTY=0x200000000 && make test-prep
...
regcomp.c: In function ‘S_regpiece’:
<command-line>: warning: overflow in conversion from ‘long int’ to ‘I32’ {aka int’} changes value from ‘8589934592’ to ‘0’ [-Woverflow]
regcomp.c:4646:15: note: in expansion of macro ‘REG_INFTY’
 4646 |     I32 max = REG_INFTY;
      |               ^~~~~~~~~
<command-line>: warning: overflow in conversion from ‘long int’ to ‘I32’ {aka int’} changes value from ‘8589934592’ to ‘0’ [-Woverflow]
regcomp.c:4700:23: note: in expansion of macro ‘REG_INFTY’
 4700 |                 max = REG_INFTY;
      |                       ^~~~~~~~~
...
regcomp.c:4744:17: warning: comparison is always false due to limited range of data type [-Wtype-limits]
 4744 |         if (max > REG_INFTY/3) {
      |                 ^
regcomp.c:4769:17: warning: comparison is always false due to limited range of data type [-Wtype-limits]
 4769 |         if (max == REG_INFTY)
      |                 ^~
regcomp.c:4775:17: warning: comparison is always false due to limited range of data type [-Wtype-limits]
 4775 |         if (max == REG_INFTY) {
      |                 ^~
regcomp_debug.c: In function ‘Perl_regprop’:
regcomp_debug.c:480:16: warning: comparison is always false due to limited range of data type [-Wtype-limits]
  480 |         if (hi == REG_INFTY)
      |                ^~

When I was scanning* for I32 issues I assumed that was by design so I didn't consider them as something to fix, should they be fixed?

  • with git grep I32 and reviewing each match, yay

@demerphq
Copy link
Collaborator

When I was scanning* for I32 issues I assumed that was by design so I didn't consider them as something to fix, should they be fixed?

"Fixed" implies they are broken, which is a little debatable. Quantifier sizes are stored inside of the regops and the largest size value we can store inside of a regop is 32 bits, adding support for 64 bit values would be awkward, as the data in the compiled regexp program is 32 bit aligned. I am not convinced we should change this, and if we do it is going to involve a lot more than just changing some I32's to IV's. Every CURLY opcode would have to pay a price for the change, or we would have to bifurcate the CURLY opcodes. None of this would be straight forward.

Might be better to simply throw an exception if we are matching against a string longer than I32_MAX bytes. There are a lot of parts of the regex engine that simply arent expecting

Setting REG_INFTY to a value larger I32_MAX should throw an exception at perl startup or something like that.

Until there is a fix, may I suggest that Perl simply bails out with: "Lines > 2GB are currently not supported" or similar.

We could do it for strings longer than 2GB, but i am not sure how we would do it for /lines/.

@tonycoz
Copy link
Contributor

tonycoz commented Apr 27, 2023

I wasn't suggesting that regops support very large quantifier values, though I assume now that an infinite quantifier is currently represented by REG_INFTY in the op, which I hadn't realized. (I hadn't looked.)

To get the effect I was thinking of, when the regop specifies REG_INFTY we could treat that as SSize_t_MAX when matching.

As to line length, we can already match strings over 2GB (using the #21012 branch):

$ ./miniperl -we '$y = "abcd"; $x = $y x 0x8000_0000; printf "%x\n", length $x; $x =~ /((?:......)*)/; printf "%x\n", length $1'
200000000
1fffffffe

it's just that a single quantifier won't match more than 2G-1 times.

I'll take a closer look at actually trying to implement this, from a quick look over the code it doesn't look unreasonable to do.

@demerphq
Copy link
Collaborator

though I assume now that an infinite quantifier is currently represented by REG_INFTY in the op

Yes. In 0678333 I changed it from a U16 to I32.

To get the effect I was thinking of, when the regop specifies REG_INFTY we could treat that as SSize_t_MAX when matching.

Yeah, I was a bit slow yesterday, of course we can do that. We used to do that with the 16 bit REG_INFTY as well. The regops STAR and PLUS for the * and + quantifiers also need to be changed, I think they dont use REG_INFTY.

I'll take a closer look at actually trying to implement this

I might do the same, but don't let that stop you from doing it first. I have other priorities just at the moment.

Thanks for following up on this!

@khwilliamson
Copy link
Contributor

khwilliamson commented Apr 27, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants