Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxysql crashes ,multiple times every week #3329

Closed
izzyquestion opened this issue Mar 2, 2021 · 16 comments
Closed

proxysql crashes ,multiple times every week #3329

izzyquestion opened this issue Mar 2, 2021 · 16 comments
Assignees
Labels

Comments

@izzyquestion
Copy link

proxysql suddenly crashes always at night time on most days of the week

2021-02-16 04:07:25 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 11 , address: 10.5.0.146 , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: multidb5 - test
HID: 11 , address: 10.5.0.128 , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: multidb2 - test
HID: 2 , address: 10.5.0.165 , port: 3306 , gtid_port: 0 , weight: 1 , status: OFFLINE_HARD , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: multidb4 - PROD

/usr/bin/proxysql(_Z13crash_handleri+0x1a)[0x4d5fda]
/lib64/libc.so.6(+0x363b0)[0x7fe48b6c03b0]
/usr/bin/proxysql[0x8614ca]
/usr/bin/proxysql[0x93164e]
/usr/bin/proxysql(mysql_close+0xe1)[0x931bfc]
/usr/bin/proxysql(_Z19monitor_ping_threadPv+0x4a0)[0x5982a0]
/usr/bin/proxysql(_ZN14ConsumerThread3runEv+0xf7)[0x5aa957]
/lib64/libpthread.so.0(+0x7e65)[0x7fe48c8a0e65]
/lib64/libc.so.6(clone+0x6d)[0x7fe48b78888d]
2021-02-18 03:26:07 main.cpp:1573:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!

a longer version of the error log is attached to the issue

i have no information on why it is crashing so i cant tell you how to reproduce the problem
please let me know if there is any information missing from this issue

ProxySQL version 2.0.15-20-g32bb92c
OS: CentOS Linux release 7.7.1908 (Core)
proxysql-log-before-crash.txt
core.23293.zip
core.30100.zip
core.31511.zip
core.2338.zip
core.17071.zip
core.17127.zip
proxysql_binary.zip

@izzyquestion
Copy link
Author

izzyquestion commented Mar 4, 2021 via email

@renecannao
Copy link
Contributor

@JavierJF : can you please look into this?
Thanks

@izzyquestion
Copy link
Author

i'll add some more information which might be useful, we are using proxysql to access a 3 node innodb cluster (single master).
each of these nodes have several ip addresses and i added all of them to mysql_servers
i also configured mysql_group_replication_hostgroups and set max_writers to 7 so that all ip addresses of the source node can be used for writes

it is also interesting that we have 2 proxysql instances running on different servers (same version) which we use to access this innodb cluster and both of them crash multiple times a week and not at the same time.

please let me know if there is any more information that might be helpful in solving this issue

@YRWYCTB
Copy link

YRWYCTB commented Mar 15, 2021

we are also using 3 nodes innodb cluster (single master) and two proxysqls .
we also meet the same problem. do you deploy any scripts to query proxysql information through admin port, like "select * from runtime_mysql_servers"?

@izzyquestion
Copy link
Author

yes, we have a local script which checks the number of open client connections

@YRWYCTB
Copy link

YRWYCTB commented Mar 16, 2021

I think these scripts which query proxysql information through admin port maybe cause the crash. we changed our scripts. you can have a try.

@izzyquestion
Copy link
Author

izzyquestion commented Mar 16, 2021 via email

@YRWYCTB
Copy link

YRWYCTB commented Mar 16, 2021

we just do not use the admin port to get information any more.

@izzyquestion
Copy link
Author

izzyquestion commented Mar 16, 2021 via email

@YRWYCTB
Copy link

YRWYCTB commented Mar 17, 2021

Our scripts query the online mysql server numbers .
Now we get the online server numbers from mysql servers directly.

@JavierJF
Copy link
Collaborator

JavierJF commented Mar 23, 2021

Hi,

after inspecting all the provided coredumps, looks like most of the memory issues can be traced back to the 'monitor_group_replication_thread', like can be seeing in this backtrace:

#0  atomic_load_p (mo=atomic_memory_order_relaxed, a=0x199eb8) at include/jemalloc/internal/atomic.h:62
#1  rtree_leaf_elm_bits_read (tsdn=<optimized out>, rtree=<optimized out>, dependent=true, elm=0x199eb8) at include/jemalloc/internal/rtree.h:175
#2  rtree_szind_slab_read (r_slab=<synthetic pointer>, r_szind=<synthetic pointer>, dependent=true, key=15252645203506290, rtree_ctx=0x7fe482ffda10, rtree=<optimized out>, tsdn=0x7fe482ffd9e0) at include/jemalloc/internal/rtree.h:500
#3  ifree (slow_path=false, tcache=0x7fe482ffdbd0, ptr=0x363033333d7472, tsd=0x7fe482ffd9e0) at src/jemalloc.c:2490
#4  je_free_default (ptr=0x363033333d7472) at src/jemalloc.c:2710
#5  0x000000000093164e in mysql_close_options (mysql=0x7fe47e201900) at /opt/proxysql/deps/mariadb-client-library/mariadb_client/libmariadb/mariadb_lib.c:1866
#6  0x0000000000931bfc in mysql_close (mysql=0x7fe47e201900) at /opt/proxysql/deps/mariadb-client-library/mariadb_client/libmariadb/mariadb_lib.c:1970
#7  0x00000000005a6516 in monitor_group_replication_thread (arg=<optimized out>) at MySQL_Monitor.cpp:1508
#8  0x00000000005aa957 in ConsumerThread::run (this=0x7fe48300e000) at MySQL_Monitor.cpp:82
#9  0x00007fe48c8a0e65 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fe48b78888d in clone () from /lib64/libc.so.6

and particularly all refer to the following line from mariadb_lib:

free(mysql->options.extension->plugin_dir);

Since that value is never updated outside mariadblibclient this itself points to memory corruption error. But furthermore the fact that the unique different crash, which backtrace is:

(gdb) bt
#0  0x00000000008f8e9e in re2::DFA::InlinedSearchLoop (this=<optimized out>, params=0x7fe476bfc1a0, have_first_byte=false, want_earliest_match=false, run_forward=true, this=<optimized out>) at re2/dfa.cc:1409
#1  0x00000000008fbc5c in FastSearchLoop (params=0x7fe476bfc1a0, this=0x7fe47fe22300) at re2/dfa.cc:1607
#2  Search (matches=0x0, epp=<synthetic pointer>, failed=0x7fe476bfc290, run_forward=<optimized out>, want_earliest_match=false, anchored=true, context=<synthetic pointer>, text=..., this=0x7fe47fe22300) at re2/dfa.cc:1800
#3  re2::Prog::SearchDFA (this=0x7fe48a43b200, text=..., const_context=..., anchor=anchor@entry=re2::Prog::kAnchored, kind=<optimized out>, kind@entry=re2::Prog::kFirstMatch, match0=match0@entry=0x7fe476bfc2d0, failed=failed@entry=0x7fe476bfc290, matches=matches@entry=0x0)
    at re2/dfa.cc:1900
#4  0x00000000008d65c2 in re2::RE2::Match (this=this@entry=0x7fe48a42a240, text=..., startpos=startpos@entry=0, endpos=<optimized out>, re_anchor=<optimized out>, submatch=submatch@entry=0x7fe476bfc4f0, nsubmatch=nsubmatch@entry=0) at re2/re2.cc:708
#5  0x00000000008d8271 in re2::RE2::DoMatch (this=0x7fe48a42a240, text=..., re_anchor=<optimized out>, consumed=0x0, args=0x0, n=0) at re2/re2.cc:805
#6  0x0000000000588710 in Apply<bool (*)(re2::StringPiece const&, re2::RE2 const&, re2::RE2::Arg const* const*, int), re2::StringPiece> (re=..., sp=..., f=<optimized out>) at ../deps/re2/re2/re2/re2.h:347
#7  re2::RE2::PartialMatch<>(re2::StringPiece const&, re2::RE2 const&) (text=..., re=...) at ../deps/re2/re2/re2/re2.h:370
#8  0x00000000005847eb in admin_session_handler (sess=0x7fe48723a900, _pa=0x7fe48a444c00, pkt=<optimized out>) at ProxySQL_Admin.cpp:3824
#9  0x0000000000536cfc in MySQL_Session::handler (this=this@entry=0x7fe48723a900) at MySQL_Session.cpp:3123
#10 0x000000000055088d in child_mysql (arg=<optimized out>) at ProxySQL_Admin.cpp:4522
#11 0x00007fe48c8a0e65 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fe48b78888d in clone () from /lib64/libc.so.6

it's inside a RE2::PartialMatch of a complete valid query, which memory has been properly initialized and the pa->match_regexes struct from admin which memory also is properly initialized:

(gdb) p *(RE2*)(pa->match_regexes.re[0])
$4 = {pattern_ = "^SELECT\\s+@@max_allowed_packet\\s*", options_ = {static kDefaultMaxMem = 8388608, encoding_ = re2::RE2::Options::EncodingUTF8, posix_syntax_ = false, longest_match_ = false, log_errors_ = false, max_mem_ = 8388608, literal_ = false, never_nl_ = false,
    dot_nl_ = false, never_capture_ = false, case_sensitive_ = false, perl_classes_ = false, word_boundary_ = false, one_line_ = false}, prefix_ = "", prefix_foldcase_ = false, entire_regexp_ = 0x7fe48a53f140, suffix_regexp_ = 0x7fe48a53f140, prog_ = 0x7fe48a43b200,
  is_one_pass_ = true, rprog_ = 0x0, error_ = 0x7fe48a408318, error_code_ = re2::RE2::NoError, error_arg_ = "", num_captures_ = 0, named_groups_ = 0x0, group_names_ = 0x0, rprog_once_ = {_M_once = 0}, num_captures_once_ = {_M_once = 2}, named_groups_once_ = {_M_once = 0},
  group_names_once_ = {_M_once = 0}}

All points to a memory corruption error, since your configuration involves mysql_group_replication_hostgroups all points that this crashes could be motivated by memory corruption described in this closed issue: #3261.
The fix for that issues is available since 'v2.0.16', so my recommendation here would be to update to at least to 'v2.0.16' and check if the issue is still present, or can't be reproduced anymore.

Hope the issue is solved with that, thanks.

@izzyquestion
Copy link
Author

izzyquestion commented Mar 24, 2021 via email

@egezonberisha
Copy link
Contributor

@JavierJF I believe this is also affecting proxysql_2.1.0. Will it be fixed in the 2.1.1?

@bskllzh
Copy link
Contributor

bskllzh commented Apr 2, 2021

@izzyquestion,can you say how to reproduce the problem?

@JavierJF
Copy link
Collaborator

JavierJF commented Apr 2, 2021

@egezonberisha Yes, the fix for was introduced for v2.1.1 via this PR: #3286, so the fix will be present in v2.1.1 release.

@JavierJF
Copy link
Collaborator

This issue was fixed in v2.1.1 via #3286. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants