Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPCache with Enum and Callback functions results in segmentation fault #10914

Closed
miguelantoniosantos opened this issue Mar 23, 2023 · 28 comments
Closed

Comments

@miguelantoniosantos
Copy link

Description

The following code:

ExampleEnum.php

<?php

enum ExampleEnum: string
{
    case FIRST= 'FIRST_VALUE';
}

Trigger.php

<?php
function auxiliarFunction(string $enumEntry) {}

function triggerException(callable $callback) {
     $callback();
}

function main() {
    triggerException(function () {
          auxiliarFunction(ExampleEnum::FIRST->value); //This will not run and trigger a SIGSEGV - core dumped
    });
}

This was first spotted on Symfony 5.4 codebase, with a callback of the TagAwareCacheInterface.
If OPCache is disabled, the error doesn't occur.
Replacing the enum with a string variable fixes the issue.

PHP Version

PHP 8.1.11 (fpm-fcgi)

Operating System

Ubuntu 20.04.5 LTS

@miguelantoniosantos
Copy link
Author

miguelantoniosantos commented Mar 23, 2023

We also found out that ExampleEnum::cases() passed with the function () use (), also causes a segmentation fault.

$cases = ExampleEnum::cases();
triggerException(function () use ($cases) {
     auxiliarFunction($cases); //This will not run and trigger a SIGSEGV - core dumped
});

We mitigated the issue with

$casesWorkAround = ExampleEnum::cases();
$cases = [];
foreach ($casesWorkAround as $caseEntry) {
    $cases [] = $caseEntry->value;
}

If needed I can create a docker with a symfony project that recreates this issue.

@nielsdos
Copy link
Member

I've been unable to reproduce this so far. I tested this without Symfony though, just with the two provided files.
Are you using the JIT or not? If you're using the JIT, does it also reproduce without?
What extensions did you use when you could reproduce this?
Does this always reproduce reliably?

If needed I can create a docker with a symfony project that recreates this issue.

This would be very helpful!

Alternatively, are you able to provide a backtrace from GDB?
To find out how to generate a backtrace, please read http://bugs.php.net/bugs-generating-backtrace.php for Linux/macos and http://bugs.php.net/bugs-generating-backtrace-win32.php for Windows.

@miguelantoniosantos
Copy link
Author

We are able to recreate this with a 100% crash rate.
I will fetch the php -i details; backtrace and recreate the issue with docker.

I expect to have this ready at max in 7 days, will try today but I can't guarantee it

@miguelantoniosantos
Copy link
Author

miguelantoniosantos commented Mar 24, 2023

After some debugging it seems like this:

  • Write Enum::cases to redis cache, throws segsev
  • After that any access to the enum inside the callback used by redis cache function will throw segsev

I wasn't able to recreate a docker instance with it.
I believe I setup the core dump correctly here is the link (size: 349MB): <snip, removed coredump file>

Gets weirder in every debug, I am able to duplicate it constantly on our codebase but I am not allowed to share it...

Might be an issue with Symfony, or the redis, but xdebug dies on the Enum::cases or Enum::Entry->value.

PHP Ini: https://gist.github.com/miguelantoniosantos/200ebb03dbc1996cbbc6c7edc8510d18

@nielsdos
Copy link
Member

Thanks. It's definitely not a JIT issue because that's disabled. Since you mention this happens with redis, this makes me wonder if it's an issue with that extension instead of PHP itself.
I assume you're using https://github.com/phpredis/phpredis ?
I can open the coredump with a multiarch gdb build, but can't do much because I'm on x86 instead of ARM.
Is it possible to load the coredump on the machine it was generated on and run the gdb command: bt full ?

@miguelantoniosantos
Copy link
Author

miguelantoniosantos commented Mar 24, 2023

I am able to do bt full on a similar machine (machine that generate it is already terminated, build image is the same)
We are using predis and not PHPRedis.

If you need any more info from me, please guide me as my knowledge of gdb is far lost.

@nielsdos
Copy link
Member

Interesting how it's not a PECL extension causing the issue. bt full on a similar machine is probably fine too as their image is the same.

@miguelantoniosantos
Copy link
Author

miguelantoniosantos commented Mar 24, 2023

For more details on the machine we use a AWS c6g.large, with Ubuntu 20.04.5 ARM base image, nothing was compiled from source, all installed from ppa:ondrej/php

The Redis server is ElastiCache Redis, and we made sure the server had sufficient RAM and CPU to handle the requests.

@nielsdos
Copy link
Member

I'm not sure if I actually explained the previous instructions properly.
Can you please share the output of bt full in gdb with your core dump loaded? I don't have access to an ARM machine and so I can't do much on my system with the coredump you provided.
To load the core dump you can use gdb path/to/binary path/to/coredump. Then you can type in the bt full command.

@miguelantoniosantos
Copy link
Author

miguelantoniosantos commented Mar 24, 2023

My first core dump didn't had all the symbols, generated a new one.
With this I am updating the version to PHP 8.1.17 (fpm-fcgi)

Core file: <snip, removed coredump file>
BT Full: https://gist.github.com/miguelantoniosantos/1270a371cdf08d59f96e1fe322e46062

From my minimal understanding of the Backtrace, seems like the GC is trying to clear something he can't clear

@miguelantoniosantos
Copy link
Author

Bringing another update, in a completely different method using enums inside the callback for the cache, and it works as intended...

So this is only happening one some methods... And we can't find a common ground between them.

@nielsdos
Copy link
Member

It looks like some sort of memory corruption bug.
Does the problem also occur when you turn off preloading?

Notes for myself and other developers:
Somehow, the constant table looks corrupted. The string values are all okay, but the constant->value zvals have weird types. e.g. type 8 and 13 (IS_OBJECT & IS_ITERABLE). The memory behind the "supposedly object" pointer points to invalid memory (but the pointer looks kinda legit, so perhaps the zval was overwritten by something that's already freed).

@miguelantoniosantos
Copy link
Author

Disabling preloading also solves the issue.

Changed from:
opcache.preload=/var/www/PROJECT_DIR/config/preload.php to opcache.preload=

@nielsdos
Copy link
Member

Okay thank you this is very useful info! So it's related to preloading, which may explain why you can only reproduce it in some methods and some not (depending if the method was preloaded or not).
I'll see if I have time this weekend to dig deeper into this.

@miguelantoniosantos
Copy link
Author

If needed we can schedule, during the week, something and you have a look into our codebase, also the preloaded files and setup.

@nielsdos
Copy link
Member

I tried a bunch of different stuff, but unfortunately didn't manage to reproduce your issue. I don't have enough information at the moment to know what scenario happens.
The corruption happens some time before the application crashes and it's hard to guess what's going on.
If possible and you want to, can you compile PHP with the address sanitizer enabled? You'll also need to patch PHP-FPM to not close its stdio descriptors (i.e. https://gist.github.com/nielsdos/f8f77f73e9d50fc02f6a3b8e6fe257cf), and run PHP-FPM in foreground.

@miguelantoniosantos
Copy link
Author

This has not been forgotten, I am trying to free up my week schedule to work on this.
Will update when I can

@magnetik
Copy link

magnetik commented May 4, 2023

I'm facing this too. Not many things to add besides we are also disabling OPCache preload as a fix (fix found without this issue ^^).

Versions :

  • PHP 8.2 compiled from source in docker
  • Symfony 6.2
  • phpredis latest

@nielsdos
Copy link
Member

nielsdos commented May 4, 2023

@magnetik are you able to provide self-contained reproduction instructions? I haven't been able to reproduce the issue locally yet unfortunately

@nielsdos
Copy link
Member

Okay it took a very long time of looking at this on and off, but I managed to build a reproducer.
I'll leave the instructions here for myself and other maintainers.

I used these configure flags: ./configure --enable-debug --enable-sockets --enable-mbstring --enable-opcache --disable-opcache-jit --enable-address-sanitizer on PHP-8.1
I didn't have USE_ZEND_ALLOC set to 0

Save this as test.php:

<?php
require 'vendor/autoload.php';

Predis\Autoloader::register();

$client = new Predis\Client();

// HERE
//$x = new ReflectionEnum(ExampleEnum::class);
//var_dump($x->getCases()[0]->getValue());

var_dump($client->set('xx', serialize(ExampleEnum::cases())));
var_dump($client->get('xx'));

Save this as enum.php and set preloading to preload this file:

<?php

enum ExampleEnum: string
{
    case FIRST= "AAA"."b";
}

Now start a CLI server with opcache on, for me it looks like this: ../php-src/sapi/cli/php -c ../php-src -S localhost:8080

Now go to http://localhost:8080/test.php, it will work but now uncomment the lines under // HERE and refresh, it will again work, but if you now refresh again you'll segfault.
The exact ASAN complaint differs from time to time, which suggests it went wrong a long time ago, but last hit it showed like this:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==161947==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x564bfb596d23 bp 0x7ffd1824cc60 sp 0x7ffd1824cc10 T0)
==161947==The signal is caused by a READ memory access.
==161947==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
    #0 0x564bfb596d23 in zend_mm_alloc_small /home/niels/php-src/Zend/zend_alloc.c:1293
    #1 0x564bfb59719a in zend_mm_alloc_heap /home/niels/php-src/Zend/zend_alloc.c:1364
    #2 0x564bfb59dc13 in _emalloc /home/niels/php-src/Zend/zend_alloc.c:2574
    #3 0x564bfb6a0d5c in zend_string_alloc /home/niels/php-src/Zend/zend_string.h:152
    #4 0x564bfb74cd8b in ZEND_ROPE_END_SPEC_TMP_CONST_HANDLER /home/niels/php-src/Zend/zend_vm_execute.h:19592
    #5 0x564bfb820301 in execute_ex /home/niels/php-src/Zend/zend_vm_execute.h:57762
    #6 0x564bfb829fc0 in zend_execute /home/niels/php-src/Zend/zend_vm_execute.h:60163
    #7 0x564bfb6392c4 in zend_execute_scripts /home/niels/php-src/Zend/zend.c:1846
    #8 0x564bfb4ce89a in php_execute_script /home/niels/php-src/main/main.c:2542
    #9 0x564bfba17c9a in php_cli_server_dispatch_script /home/niels/php-src/sapi/cli/php_cli_server.c:2040
    #10 0x564bfba18f9d in php_cli_server_dispatch /home/niels/php-src/sapi/cli/php_cli_server.c:2212
    #11 0x564bfba1aa05 in php_cli_server_recv_event_read_request /home/niels/php-src/sapi/cli/php_cli_server.c:2534
    #12 0x564bfba1b51d in php_cli_server_do_event_for_each_fd_callback /home/niels/php-src/sapi/cli/php_cli_server.c:2621
    #13 0x564bfba0fe70 in php_cli_server_poller_iter_on_active /home/niels/php-src/sapi/cli/php_cli_server.c:869
    #14 0x564bfba1b72a in php_cli_server_do_event_for_each_fd /home/niels/php-src/sapi/cli/php_cli_server.c:2639
    #15 0x564bfba1b8ea in php_cli_server_do_event_loop /home/niels/php-src/sapi/cli/php_cli_server.c:2649
    #16 0x564bfba1c0d0 in do_cli_server /home/niels/php-src/sapi/cli/php_cli_server.c:2779
    #17 0x564bfba04fac in main /home/niels/php-src/sapi/cli/php_cli.c:1370
    #18 0x7f1dee00084f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
    #19 0x7f1dee000909 in __libc_start_main (/usr/lib/libc.so.6+0x23909) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
    #20 0x564bfac03ac4 in _start (/home/niels/php-src/sapi/cli/php+0x603ac4) (BuildId: 1613eaf504b5b4f64cfbd2cda881cd145f401cfb)

@nielsdos
Copy link
Member

nielsdos commented Jul 11, 2023

Enabling opcache.protect_memory reveals what's going on I think. The enum, whose first value is of type IS_CONSTANT_AST, is stored in SHM because of the preloading. ReflectionClassConstant::getValue() tries to modify it with this code:

if (Z_TYPE(ref->value) == IS_CONSTANT_AST) {
	zval_update_constant_ex(&ref->value, ref->ce);
}
ZVAL_COPY_OR_DUP(return_value, &ref->value);

This shouldn't be allowed because it's in SHM. A quick test by replacing it with this fixes it:

if (Z_TYPE(ref->value) == IS_CONSTANT_AST) {
	ZVAL_COPY(return_value, &ref->value);
	zval_update_constant_ex(return_value, ref->ce);
} else {
	ZVAL_COPY_OR_DUP(return_value, &ref->value);
}

Of course there are more places where this change needs to happen.
Also, I think that we may do the in-place modfication, as long as it's not in SHM, so the above fix is still even too conservative. I need to read some more of this code to fully grasp what the right solution would be.

Relevant ASAN complaint:

==171061==ERROR: AddressSanitizer: SEGV on unknown address 0x000041dc2218 (pc 0x5626ef9fa0ce bp 0x7fff3632ff00 sp 0x7fff3632fe00 T0)
==171061==The signal is caused by a WRITE memory access.
    #0 0x5626ef9fa0ce in zval_update_constant_ex /home/niels/php-src/Zend/zend_execute_API.c:705
    #1 0x5626ef55d9da in zim_ReflectionClassConstant_getValue /home/niels/php-src/ext/reflection/php_reflection.c:3880
    #2 0x5626efad2c7d in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER /home/niels/php-src/Zend/zend_vm_execute.h:1870
    #3 0x5626efc18192 in execute_ex /home/niels/php-src/Zend/zend_vm_execute.h:55823
    #4 0x5626efc2a009 in zend_execute /home/niels/php-src/Zend/zend_vm_execute.h:60163
    #5 0x5626efa3930d in zend_execute_scripts /home/niels/php-src/Zend/zend.c:1846
    #6 0x5626ef8ce89a in php_execute_script /home/niels/php-src/main/main.c:2542
    #7 0x5626efe17ce3 in php_cli_server_dispatch_script /home/niels/php-src/sapi/cli/php_cli_server.c:2040
    #8 0x5626efe18fe6 in php_cli_server_dispatch /home/niels/php-src/sapi/cli/php_cli_server.c:2212
    #9 0x5626efe1aa4e in php_cli_server_recv_event_read_request /home/niels/php-src/sapi/cli/php_cli_server.c:2534
    #10 0x5626efe1b566 in php_cli_server_do_event_for_each_fd_callback /home/niels/php-src/sapi/cli/php_cli_server.c:2621
    #11 0x5626efe0feb9 in php_cli_server_poller_iter_on_active /home/niels/php-src/sapi/cli/php_cli_server.c:869
    #12 0x5626efe1b773 in php_cli_server_do_event_for_each_fd /home/niels/php-src/sapi/cli/php_cli_server.c:2639
    #13 0x5626efe1b933 in php_cli_server_do_event_loop /home/niels/php-src/sapi/cli/php_cli_server.c:2649
    #14 0x5626efe1c119 in do_cli_server /home/niels/php-src/sapi/cli/php_cli_server.c:2779
    #15 0x5626efe04ff5 in main /home/niels/php-src/sapi/cli/php_cli.c:1370
    #16 0x7f8e80e6484f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
    #17 0x7f8e80e64909 in __libc_start_main (/usr/lib/libc.so.6+0x23909) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)

@nielsdos
Copy link
Member

nielsdos commented Jul 11, 2023

With protect_memory enabled, this is the minimal reproducer, which will crash instantly if enum.php is preloaded:

<?php
$x = new ReflectionEnum(ExampleEnum::class);
var_dump($x->getCases()[0]->getValue());

No need to externals dependencies.

The external dependencies just created the "lucky" heap layout that made it able to hit the issue without protect_memory.

@iluuu1994
Copy link
Member

iluuu1994 commented Jul 11, 2023

@nielsdos I haven't tried it, but I think this line needs to be fixed:

ZEND_HASH_MAP_FOREACH_STR_KEY_PTR(&ce->constants_table, name, constant) {

We're accessing the class constants directly, instead of using CE_CONSTANTS_TABLE(ce), which separates the table if it contains constant ASTs.

Thank you a lot for the analysis!

Edit: And it looks like getCase will have the same issue.

@nielsdos
Copy link
Member

@iluuu1994 Yeah I also found that. I just wasn't sure about the overhead but if this is the general approach I'll go with that.
I'm checking all code now.

@iluuu1994
Copy link
Member

@nielsdos Class constants should only be evaluated once. If we copy the constant AST and evaluate it then, we'll lose the value and evaluate it again on the next access. I think, for enum cases this would mean getting non-unique case values. So I don't think we have another option.

@nielsdos
Copy link
Member

Both seemed to work so thanks for your explanation on why we need separation. I'll fix it using the separation.

@iluuu1994
Copy link
Member

Thank you a lot @nielsdos!

nielsdos added a commit to nielsdos/php-src that referenced this issue Jul 11, 2023
…segmentation fault

See linked issue for analysis.
@magnetik
Copy link

Awesome @nielsdos ! Thanks for the fix. looking forward to have it :)

nielsdos added a commit that referenced this issue Jul 11, 2023
* PHP-8.1:
  Fix GH-10914: OPCache with Enum and Callback functions results in segmentation fault
nielsdos added a commit that referenced this issue Jul 11, 2023
* PHP-8.2:
  Fix GH-10914: OPCache with Enum and Callback functions results in segmentation fault
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants