Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP-FPM 8.2 SIGSEGV #16432

Open
jruston opened this issue Oct 14, 2024 · 8 comments
Open

PHP-FPM 8.2 SIGSEGV #16432

jruston opened this issue Oct 14, 2024 · 8 comments

Comments

@jruston
Copy link

jruston commented Oct 14, 2024

Description

I am encountering an issue where very rarely, one of the child processes of PHP-FPM crashes like this:

[13-Oct-2024 21:37:31] WARNING: [pool www] child 3668047 exited on signal 11 (SIGSEGV) after 6678.827466 seconds from start

When this happens, not only does the child process exit, but PHP-FPM stops responding to all requests completely. This creates a massive server load as the requests are queued indefinitely, and is only resolved by restarting the PHP-FPM service.

I have periodically experienced this in various versions of PHP 8.2, including in the current 8.2.24.

PHP Version

PHP 8.2.24

Operating System

AlmaLinux 8.9

@devnexen
Copy link
Member

Hi and thanks for your report, would you be able to provide a backtrace ? e.g. attaching gdb to the relevant process then typing bt full.

$ gdb -p <fpm process id>
<requests>
(gdb) <crash>
(gdb) bt full

@jruston
Copy link
Author

jruston commented Oct 21, 2024

@devnexen I finally have the backtrace. I had to wait for it to occur again. You can find it below:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005571847466e6 in fpm_status_export_to_zval (status=0x7fd460e13070) at /usr/src/debug/php-8.2.24-1.el8.remi.x86_64/sapi/fpm/fpm/fpm_status.c:65
65 struct fpm_scoreboard_proc_s procs[scoreboard.nprocs];
(gdb) bt full
#0 0x00005571847466e6 in fpm_status_export_to_zval (status=0x7fd460e13070) at /usr/src/debug/php-8.2.24-1.el8.remi.x86_64/sapi/fpm/fpm/fpm_status.c:65
scoreboard = {{lock = 1, dummy = "\001", '\000' <repeats 14 times>}, pool = "www", '\000' <repeats 28 times>, pm = 1, start_epoch = 1729283343, idle = 11600,
active = 1200, active_max = 6029, requests = 50663485, max_children_reached = 0, lq = 0, lq_max = 0, lq_len = 0, nprocs = 12800, free_proc = 2999, slow_rq = 0,
shared = 0x0, procs = 0x7ffefe3f22f0}
scoreboard_p = 0x7fd43fda8000
fpm_proc_stats = {value = {lval = 206158430224, dval = 1.0185579797423812e-312, counted = 0x3000000010, str = 0x3000000010, arr = 0x3000000010, obj = 0x3000000010,
res = 0x3000000010, ref = 0x3000000010, ast = 0x3000000010, zv = 0x3000000010, ptr = 0x3000000010, ce = 0x3000000010, func = 0x3000000010, ww = {w1 = 16, w2 = 48}},
u1 = {type_info = 4265550624, v = {type = 32 ' ', type_flags = 35 '#', u = {extra = 65087}}}, u2 = {next = 32766, cache_slot = 32766, opline_num = 32766,
lineno = 32766, num_args = 32766, fe_pos = 32766, fe_iter_idx = 32766, property_guard = 32766, constant_flags = 32766, extra = 32766}}
fpm_proc_stat = {value = {lval = 140733163971168, dval = 6.9531421548697167e-310, counted = 0x7ffefe3f2260, str = 0x7ffefe3f2260, arr = 0x7ffefe3f2260,
obj = 0x7ffefe3f2260, res = 0x7ffefe3f2260, ref = 0x7ffefe3f2260, ast = 0x7ffefe3f2260, zv = 0x7ffefe3f2260, ptr = 0x7ffefe3f2260, ce = 0x7ffefe3f2260,
func = 0x7ffefe3f2260, ww = {w1 = 4265550432, w2 = 32766}}, u1 = {type_info = 1493901056, v = {type = 0 '\000', type_flags = 31 '\037', u = {extra = 22795}}}, u2 = {
next = 239762071, cache_slot = 239762071, opline_num = 239762071, lineno = 239762071, num_args = 239762071, fe_pos = 239762071, fe_iter_idx = 239762071,
property_guard = 239762071, constant_flags = 239762071, extra = 239762071}}
now_epoch =
duration =
now = {tv_sec = 140549600313354, tv_usec = 140550170278408}
cpu =
i =
func = "fpm_status_export_to_zval"
procs =
proc_p =
#1 0x00005571847413eb in zif_fpm_get_status (execute_data=, return_value=0x7fd460e13070)

@devnexen
Copy link
Member

Thanks. Is there a possibility to share some relevant parts of your FPM configuration eventually ?

@jruston
Copy link
Author

jruston commented Oct 21, 2024

@devnexen I am happy to share whatever you need in the configuration :)

@devnexen
Copy link
Member

devnexen commented Oct 21, 2024

Anything regarding the pool(s) (e.g. max children and related). asking because something like this

scoreboard = {{lock = 1, dummy = "\001", '\000' <repeats 14 times>}, pool = "www", '\000' <repeats 28 times>, pm = 1, start_epoch = 1729283343, idle = 11600,
active = 1200, active_max = 6029, requests = 50663485, max_children_reached = 0, lq = 0, lq_max = 0, lq_len = 0, nprocs = 12800, free_proc = 2999, slow_rq = 0,
shared = 0x0, procs = 0x7ffefe3f22f0}

then, to clarify, we do this

struct fpm_scoreboard_s procs[scoreboard.nprocs];

@jruston
Copy link
Author

jruston commented Oct 21, 2024

@devnexen Most of the PHP-FPM config is just the default values for PHP 8.2 except for things such as max children:

pm = static
pm.max_children = 12800
pm.start_servers = 7000
pm.min_spare_servers = 2800
pm.max_spare_servers = 8400
pm.max_requests = 1000

These values are very high because this is a very high traffic server which is prone to huge, short bursts of traffic. The performance is excellent, except for when this occasional SIGSEGV occurs which breaks it all.

I can identify no pattern of when it occurs. It seems to happen with both high traffic and reasonably low traffic. Sometimes it happens every few days, but sometimes it only happens once per month.

@devnexen
Copy link
Member

Thanks for confirming, wonder if some threshold should be put or allocating the variable on the heap instead ... Let's see what @bukka thinks.

@bukka
Copy link
Member

bukka commented Oct 23, 2024

This is a bug indeed. Yeah it needs to go on heap as stack is too small when number of children is that high. Fix in #16564 .

The reason why it freezes everything is because the lock is set and there is currently no unlocking mechanism if child crashes before unlocking. This is problem on its own and I have got an initial fix for that in #15805 but it needs more work and mainly testing to be stable. I plan to look into it more in the coming weeks. This is a bit separate thing and to resolve this particular issue, the fix in #16564 should be enough.

In addition, it might be worth for you to check why fpm_get_status is used and what it actually checks. I'm wondering if it always needs all processes. We should maybe add an optional param full to allow providing only main scoreboard info like it's for that API. That would be a feature though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants