proxysql crashes ,multiple times every week #3329

izzyquestion · 2021-03-02T07:33:41Z

proxysql suddenly crashes always at night time on most days of the week

2021-02-16 04:07:25 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 11 , address: 10.5.0.146 , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: multidb5 - test
HID: 11 , address: 10.5.0.128 , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: multidb2 - test
HID: 2 , address: 10.5.0.165 , port: 3306 , gtid_port: 0 , weight: 1 , status: OFFLINE_HARD , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: multidb4 - PROD

/usr/bin/proxysql(_Z13crash_handleri+0x1a)[0x4d5fda]
/lib64/libc.so.6(+0x363b0)[0x7fe48b6c03b0]
/usr/bin/proxysql[0x8614ca]
/usr/bin/proxysql[0x93164e]
/usr/bin/proxysql(mysql_close+0xe1)[0x931bfc]
/usr/bin/proxysql(_Z19monitor_ping_threadPv+0x4a0)[0x5982a0]
/usr/bin/proxysql(_ZN14ConsumerThread3runEv+0xf7)[0x5aa957]
/lib64/libpthread.so.0(+0x7e65)[0x7fe48c8a0e65]
/lib64/libc.so.6(clone+0x6d)[0x7fe48b78888d]
2021-02-18 03:26:07 main.cpp:1573:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!

a longer version of the error log is attached to the issue

i have no information on why it is crashing so i cant tell you how to reproduce the problem
please let me know if there is any information missing from this issue

ProxySQL version 2.0.15-20-g32bb92c
OS: CentOS Linux release 7.7.1908 (Core)
proxysql-log-before-crash.txt
core.23293.zip
core.30100.zip
core.31511.zip
core.2338.zip
core.17071.zip
core.17127.zip
proxysql_binary.zip

izzyquestion · 2021-03-04T17:54:41Z

Is there a specific bug that was fixed in 2.0.17 that you think I am experiencing?

…

On Thu, Mar 4, 2021, 17:24 crydx ***@***.***> wrote: upgrade to 2.0.17 may solve the problem — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AM5ZBDYPSCTFZJCM436Z4ALTB6Q3ZANCNFSM4YOKOKUA> .

renecannao · 2021-03-04T21:49:36Z

@JavierJF : can you please look into this?
Thanks

izzyquestion · 2021-03-08T08:50:02Z

i'll add some more information which might be useful, we are using proxysql to access a 3 node innodb cluster (single master).
each of these nodes have several ip addresses and i added all of them to mysql_servers
i also configured mysql_group_replication_hostgroups and set max_writers to 7 so that all ip addresses of the source node can be used for writes

it is also interesting that we have 2 proxysql instances running on different servers (same version) which we use to access this innodb cluster and both of them crash multiple times a week and not at the same time.

please let me know if there is any more information that might be helpful in solving this issue

YRWYCTB · 2021-03-15T03:43:35Z

we are also using 3 nodes innodb cluster (single master) and two proxysqls .
we also meet the same problem. do you deploy any scripts to query proxysql information through admin port, like "select * from runtime_mysql_servers"?

izzyquestion · 2021-03-15T05:52:37Z

yes, we have a local script which checks the number of open client connections

YRWYCTB · 2021-03-16T07:18:31Z

I think these scripts which query proxysql information through admin port maybe cause the crash. we changed our scripts. you can have a try.

izzyquestion · 2021-03-16T08:48:30Z

What change did you perform to the scripts?

…

On Tue, Mar 16, 2021, 09:18 YRWYCTB ***@***.***> wrote: I think these scripts which query proxysql information through admin port maybe cause the crash. we changed our scripts. you can have a try. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AM5ZBD7WTGYXEIJRG6LFK53TD4A5PANCNFSM4YOKOKUA> .

YRWYCTB · 2021-03-16T10:15:37Z

we just do not use the admin port to get information any more.

izzyquestion · 2021-03-16T16:49:19Z

So you've given up on the function that the scripts were used for? Or did you get the data in another way? Our scripts query the number of client connections to proxysql

…

On Tue, Mar 16, 2021, 12:15 YRWYCTB ***@***.***> wrote: we just do not use the admin port to get information any more. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AM5ZBD4QMVQNRSPF6RP2QQTTD4VVZANCNFSM4YOKOKUA> .

YRWYCTB · 2021-03-17T07:28:27Z

Our scripts query the online mysql server numbers .
Now we get the online server numbers from mysql servers directly.

JavierJF · 2021-03-23T19:46:17Z

Hi,

after inspecting all the provided coredumps, looks like most of the memory issues can be traced back to the 'monitor_group_replication_thread', like can be seeing in this backtrace:

#0  atomic_load_p (mo=atomic_memory_order_relaxed, a=0x199eb8) at include/jemalloc/internal/atomic.h:62
#1  rtree_leaf_elm_bits_read (tsdn=<optimized out>, rtree=<optimized out>, dependent=true, elm=0x199eb8) at include/jemalloc/internal/rtree.h:175
#2  rtree_szind_slab_read (r_slab=<synthetic pointer>, r_szind=<synthetic pointer>, dependent=true, key=15252645203506290, rtree_ctx=0x7fe482ffda10, rtree=<optimized out>, tsdn=0x7fe482ffd9e0) at include/jemalloc/internal/rtree.h:500
#3  ifree (slow_path=false, tcache=0x7fe482ffdbd0, ptr=0x363033333d7472, tsd=0x7fe482ffd9e0) at src/jemalloc.c:2490
#4  je_free_default (ptr=0x363033333d7472) at src/jemalloc.c:2710
#5  0x000000000093164e in mysql_close_options (mysql=0x7fe47e201900) at /opt/proxysql/deps/mariadb-client-library/mariadb_client/libmariadb/mariadb_lib.c:1866
#6  0x0000000000931bfc in mysql_close (mysql=0x7fe47e201900) at /opt/proxysql/deps/mariadb-client-library/mariadb_client/libmariadb/mariadb_lib.c:1970
#7  0x00000000005a6516 in monitor_group_replication_thread (arg=<optimized out>) at MySQL_Monitor.cpp:1508
#8  0x00000000005aa957 in ConsumerThread::run (this=0x7fe48300e000) at MySQL_Monitor.cpp:82
#9  0x00007fe48c8a0e65 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fe48b78888d in clone () from /lib64/libc.so.6

and particularly all refer to the following line from mariadb_lib:

free(mysql->options.extension->plugin_dir);

Since that value is never updated outside mariadblibclient this itself points to memory corruption error. But furthermore the fact that the unique different crash, which backtrace is:

(gdb) bt
#0  0x00000000008f8e9e in re2::DFA::InlinedSearchLoop (this=<optimized out>, params=0x7fe476bfc1a0, have_first_byte=false, want_earliest_match=false, run_forward=true, this=<optimized out>) at re2/dfa.cc:1409
#1  0x00000000008fbc5c in FastSearchLoop (params=0x7fe476bfc1a0, this=0x7fe47fe22300) at re2/dfa.cc:1607
#2  Search (matches=0x0, epp=<synthetic pointer>, failed=0x7fe476bfc290, run_forward=<optimized out>, want_earliest_match=false, anchored=true, context=<synthetic pointer>, text=..., this=0x7fe47fe22300) at re2/dfa.cc:1800
#3  re2::Prog::SearchDFA (this=0x7fe48a43b200, text=..., const_context=..., anchor=anchor@entry=re2::Prog::kAnchored, kind=<optimized out>, kind@entry=re2::Prog::kFirstMatch, match0=match0@entry=0x7fe476bfc2d0, failed=failed@entry=0x7fe476bfc290, matches=matches@entry=0x0)
    at re2/dfa.cc:1900
#4  0x00000000008d65c2 in re2::RE2::Match (this=this@entry=0x7fe48a42a240, text=..., startpos=startpos@entry=0, endpos=<optimized out>, re_anchor=<optimized out>, submatch=submatch@entry=0x7fe476bfc4f0, nsubmatch=nsubmatch@entry=0) at re2/re2.cc:708
#5  0x00000000008d8271 in re2::RE2::DoMatch (this=0x7fe48a42a240, text=..., re_anchor=<optimized out>, consumed=0x0, args=0x0, n=0) at re2/re2.cc:805
#6  0x0000000000588710 in Apply<bool (*)(re2::StringPiece const&, re2::RE2 const&, re2::RE2::Arg const* const*, int), re2::StringPiece> (re=..., sp=..., f=<optimized out>) at ../deps/re2/re2/re2/re2.h:347
#7  re2::RE2::PartialMatch<>(re2::StringPiece const&, re2::RE2 const&) (text=..., re=...) at ../deps/re2/re2/re2/re2.h:370
#8  0x00000000005847eb in admin_session_handler (sess=0x7fe48723a900, _pa=0x7fe48a444c00, pkt=<optimized out>) at ProxySQL_Admin.cpp:3824
#9  0x0000000000536cfc in MySQL_Session::handler (this=this@entry=0x7fe48723a900) at MySQL_Session.cpp:3123
#10 0x000000000055088d in child_mysql (arg=<optimized out>) at ProxySQL_Admin.cpp:4522
#11 0x00007fe48c8a0e65 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fe48b78888d in clone () from /lib64/libc.so.6

it's inside a RE2::PartialMatch of a complete valid query, which memory has been properly initialized and the pa->match_regexes struct from admin which memory also is properly initialized:

(gdb) p *(RE2*)(pa->match_regexes.re[0])
$4 = {pattern_ = "^SELECT\\s+@@max_allowed_packet\\s*", options_ = {static kDefaultMaxMem = 8388608, encoding_ = re2::RE2::Options::EncodingUTF8, posix_syntax_ = false, longest_match_ = false, log_errors_ = false, max_mem_ = 8388608, literal_ = false, never_nl_ = false,
    dot_nl_ = false, never_capture_ = false, case_sensitive_ = false, perl_classes_ = false, word_boundary_ = false, one_line_ = false}, prefix_ = "", prefix_foldcase_ = false, entire_regexp_ = 0x7fe48a53f140, suffix_regexp_ = 0x7fe48a53f140, prog_ = 0x7fe48a43b200,
  is_one_pass_ = true, rprog_ = 0x0, error_ = 0x7fe48a408318, error_code_ = re2::RE2::NoError, error_arg_ = "", num_captures_ = 0, named_groups_ = 0x0, group_names_ = 0x0, rprog_once_ = {_M_once = 0}, num_captures_once_ = {_M_once = 2}, named_groups_once_ = {_M_once = 0},
  group_names_once_ = {_M_once = 0}}

All points to a memory corruption error, since your configuration involves mysql_group_replication_hostgroups all points that this crashes could be motivated by memory corruption described in this closed issue: #3261.
The fix for that issues is available since 'v2.0.16', so my recommendation here would be to update to at least to 'v2.0.16' and check if the issue is still present, or can't be reproduced anymore.

Hope the issue is solved with that, thanks.

izzyquestion · 2021-03-24T08:56:31Z

Thank you very much!! I will update the version and see if this solves the problem

…

On Tue, Mar 23, 2021, 21:46 Javier Jaramago Fernández < ***@***.***> wrote: Hi, after inspecting all the provided coredumps, looks like most of the memory issues can be traced back to the 'monitor_group_replication_thread', like can be seeing in this backtrace: #0 atomic_load_p (mo=atomic_memory_order_relaxed, a=0x199eb8) at include/jemalloc/internal/atomic.h:62 #1 rtree_leaf_elm_bits_read (tsdn=<optimized out>, rtree=<optimized out>, dependent=true, elm=0x199eb8) at include/jemalloc/internal/rtree.h:175 #2 rtree_szind_slab_read (r_slab=<synthetic pointer>, r_szind=<synthetic pointer>, dependent=true, key=15252645203506290, rtree_ctx=0x7fe482ffda10, rtree=<optimized out>, tsdn=0x7fe482ffd9e0) at include/jemalloc/internal/rtree.h:500 #3 ifree (slow_path=false, tcache=0x7fe482ffdbd0, ptr=0x363033333d7472, tsd=0x7fe482ffd9e0) at src/jemalloc.c:2490 #4 je_free_default (ptr=0x363033333d7472) at src/jemalloc.c:2710 #5 0x000000000093164e in mysql_close_options (mysql=0x7fe47e201900) at /opt/proxysql/deps/mariadb-client-library/mariadb_client/libmariadb/mariadb_lib.c:1866 #6 0x0000000000931bfc in mysql_close (mysql=0x7fe47e201900) at /opt/proxysql/deps/mariadb-client-library/mariadb_client/libmariadb/mariadb_lib.c:1970 #7 0x00000000005a6516 in monitor_group_replication_thread (arg=<optimized out>) at MySQL_Monitor.cpp:1508 #8 0x00000000005aa957 in ConsumerThread::run (this=0x7fe48300e000) at MySQL_Monitor.cpp:82 #9 0x00007fe48c8a0e65 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fe48b78888d in clone () from /lib64/libc.so.6 and particularly all refer to the following line from mariadb_lib: free(mysql->options.extension->plugin_dir); Since that value is never updated outside mariadblibclient this itself points to memory corruption error. But furthermore the fact that the unique different crash, which backtrace is: (gdb) bt #0 0x00000000008f8e9e in re2::DFA::InlinedSearchLoop (this=<optimized out>, params=0x7fe476bfc1a0, have_first_byte=false, want_earliest_match=false, run_forward=true, this=<optimized out>) at re2/dfa.cc:1409 #1 0x00000000008fbc5c in FastSearchLoop (params=0x7fe476bfc1a0, this=0x7fe47fe22300) at re2/dfa.cc:1607 #2 Search (matches=0x0, epp=<synthetic pointer>, failed=0x7fe476bfc290, run_forward=<optimized out>, want_earliest_match=false, anchored=true, context=<synthetic pointer>, text=..., this=0x7fe47fe22300) at re2/dfa.cc:1800 #3 re2::Prog::SearchDFA (this=0x7fe48a43b200, text=..., const_context=..., ***@***.***=re2::Prog::kAnchored, kind=<optimized out>, ***@***.***=re2::Prog::kFirstMatch, ***@***.***=0x7fe476bfc2d0, ***@***.***=0x7fe476bfc290, ***@***.***=0x0) at re2/dfa.cc:1900 #4 0x00000000008d65c2 in re2::RE2::Match ***@***.***=0x7fe48a42a240, text=..., ***@***.***=0, endpos=<optimized out>, re_anchor=<optimized out>, ***@***.***=0x7fe476bfc4f0, ***@***.***=0) at re2/re2.cc:708 #5 0x00000000008d8271 in re2::RE2::DoMatch (this=0x7fe48a42a240, text=..., re_anchor=<optimized out>, consumed=0x0, args=0x0, n=0) at re2/re2.cc:805 #6 0x0000000000588710 in Apply<bool (*)(re2::StringPiece const&, re2::RE2 const&, re2::RE2::Arg const* const*, int), re2::StringPiece> (re=..., sp=..., f=<optimized out>) at ../deps/re2/re2/re2/re2.h:347 #7 re2::RE2::PartialMatch<>(re2::StringPiece const&, re2::RE2 const&) (text=..., re=...) at ../deps/re2/re2/re2/re2.h:370 #8 0x00000000005847eb in admin_session_handler (sess=0x7fe48723a900, _pa=0x7fe48a444c00, pkt=<optimized out>) at ProxySQL_Admin.cpp:3824 #9 0x0000000000536cfc in MySQL_Session::handler ***@***.***=0x7fe48723a900) at MySQL_Session.cpp:3123 #10 0x000000000055088d in child_mysql (arg=<optimized out>) at ProxySQL_Admin.cpp:4522 #11 0x00007fe48c8a0e65 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fe48b78888d in clone () from /lib64/libc.so.6 it's inside a RE2::PartialMatch of a complete valid query, which memory has been properly initialized and the pa->match_regexes struct from admin which memory also is properly initialized: (gdb) p *(RE2*)(pa->match_regexes.re[0]) $4 = {pattern_ = "^SELECT\\s+@@max_allowed_packet\\s*", options_ = {static kDefaultMaxMem = 8388608, encoding_ = re2::RE2::Options::EncodingUTF8, posix_syntax_ = false, longest_match_ = false, log_errors_ = false, max_mem_ = 8388608, literal_ = false, never_nl_ = false, dot_nl_ = false, never_capture_ = false, case_sensitive_ = false, perl_classes_ = false, word_boundary_ = false, one_line_ = false}, prefix_ = "", prefix_foldcase_ = false, entire_regexp_ = 0x7fe48a53f140, suffix_regexp_ = 0x7fe48a53f140, prog_ = 0x7fe48a43b200, is_one_pass_ = true, rprog_ = 0x0, error_ = 0x7fe48a408318, error_code_ = re2::RE2::NoError, error_arg_ = "", num_captures_ = 0, named_groups_ = 0x0, group_names_ = 0x0, rprog_once_ = {_M_once = 0}, num_captures_once_ = {_M_once = 2}, named_groups_once_ = {_M_once = 0}, group_names_once_ = {_M_once = 0}} All points to a memory corruption error, since your configuration involves mysql_group_replication_hostgroups all points that this crashes could be motivated by memory corruption described in this closed issue: #3261 <#3261>. The fix for that issues is available since 'v2.0.16', so my recommendation here would be to update to at least to 'v2.0.16' and check if the issue is still present, or can't be reproduced anymore. Hope the issue is solved with that, thanks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AM5ZBD4VI3Y5YAEVENWZOYTTFDVZVANCNFSM4YOKOKUA> .

egezonberisha · 2021-03-24T19:56:15Z

@JavierJF I believe this is also affecting proxysql_2.1.0. Will it be fixed in the 2.1.1?

bskllzh · 2021-04-02T13:19:49Z

@izzyquestion，can you say how to reproduce the problem?

JavierJF · 2021-04-02T14:40:40Z

@egezonberisha Yes, the fix for was introduced for v2.1.1 via this PR: #3286, so the fix will be present in v2.1.1 release.

JavierJF · 2021-07-24T11:37:49Z

This issue was fixed in v2.1.1 via #3286. Closing.

renecannao assigned JavierJF Mar 4, 2021

renecannao added the bug label Mar 4, 2021

JavierJF closed this as completed Jul 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxysql crashes ,multiple times every week #3329

proxysql crashes ,multiple times every week #3329

izzyquestion commented Mar 2, 2021

izzyquestion commented Mar 4, 2021 via email

renecannao commented Mar 4, 2021

izzyquestion commented Mar 8, 2021

YRWYCTB commented Mar 15, 2021

izzyquestion commented Mar 15, 2021

YRWYCTB commented Mar 16, 2021

izzyquestion commented Mar 16, 2021 via email

YRWYCTB commented Mar 16, 2021

izzyquestion commented Mar 16, 2021 via email

YRWYCTB commented Mar 17, 2021

JavierJF commented Mar 23, 2021 •

edited

Loading

izzyquestion commented Mar 24, 2021 via email

egezonberisha commented Mar 24, 2021

bskllzh commented Apr 2, 2021

JavierJF commented Apr 2, 2021

JavierJF commented Jul 24, 2021

proxysql crashes ,multiple times every week #3329

proxysql crashes ,multiple times every week #3329

Comments

izzyquestion commented Mar 2, 2021

izzyquestion commented Mar 4, 2021 via email

renecannao commented Mar 4, 2021

izzyquestion commented Mar 8, 2021

YRWYCTB commented Mar 15, 2021

izzyquestion commented Mar 15, 2021

YRWYCTB commented Mar 16, 2021

izzyquestion commented Mar 16, 2021 via email

YRWYCTB commented Mar 16, 2021

izzyquestion commented Mar 16, 2021 via email

YRWYCTB commented Mar 17, 2021

JavierJF commented Mar 23, 2021 • edited Loading

izzyquestion commented Mar 24, 2021 via email

egezonberisha commented Mar 24, 2021

bskllzh commented Apr 2, 2021

JavierJF commented Apr 2, 2021

JavierJF commented Jul 24, 2021

JavierJF commented Mar 23, 2021 •

edited

Loading