Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crashes during cluster shutdown in CI testing #4542

Closed
mirostauder opened this issue May 8, 2024 · 1 comment
Closed

crashes during cluster shutdown in CI testing #4542

mirostauder opened this issue May 8, 2024 · 1 comment

Comments

@mirostauder
Copy link
Collaborator

mirostauder commented May 8, 2024

occasional crashes of cluster nodes during shutdown
experienced in CI testing on k8s-testing jenkins job 808

job is archived with all logs and crashdumps
crashdump backtraces below.

[2024-05-08 15:08:44] >>> WARN - Core file found 'test/cluster/node07/core.1397908' ...
[2024-05-08 15:08:44] >>> ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi', real uid: 112, effective uid: 112, real gid: 120, effective gid: 120, execfn: '../../src/proxysql', platform: 'x86_64'
The program is not being run.
[2024-05-08 15:08:44] >>> Reading symbols from ./proxysql...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056c8ff803c78 in ProxySQL_HTTP_Server::~ProxySQL_HTTP_Server (
    this=0x5551494e55202c20, __in_chrg=<optimized out>)
    at ProxySQL_HTTP_Server.cpp:876
876		if (variables.proxysql_latest_version) {
(gdb) (gdb) #0  0x000056c8ff803c78 in ProxySQL_HTTP_Server::~ProxySQL_HTTP_Server (
    this=0x5551494e55202c20, __in_chrg=<optimized out>)
    at ProxySQL_HTTP_Server.cpp:876
#1  0x000056c8ff5a13f3 in ProxySQL_Admin::admin_shutdown (this=0x7d7cc1231800)
    at ProxySQL_Admin.cpp:6945
#2  0x000056c8ff5a1cae in ProxySQL_Admin::~ProxySQL_Admin (
    this=0x7d7cc1231800, __in_chrg=<optimized out>) at ProxySQL_Admin.cpp:7013
#3  0x000056c8ff291a9d in ProxySQL_Main_shutdown_all_modules () at main.cpp:975
#4  0x000056c8ff293b86 in ProxySQL_Main_init_phase4___shutdown ()
    at main.cpp:1243
#5  0x000056c8ff2a0556 in main (argc=5, argv=0x7ffec3dd4e28) at main.cpp:2526
(gdb) 
quit
[2024-05-08 15:08:47] >>> Compressing 'test/cluster/node07/core.1397908' ...
[2024-05-08 15:08:50] >>> WARN - Core file found 'test/cluster/node07/core.1355228' ...
[2024-05-08 15:08:50] >>> ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi', real uid: 112, effective uid: 112, real gid: 120, effective gid: 120, execfn: '../../src/proxysql', platform: 'x86_64'
413	../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: No such file or directory.
The program is not being run.
[2024-05-08 15:08:50] >>> Reading symbols from ./proxysql...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcmp_avx2_movbe ()
    at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:413
(gdb) (gdb) #0  __memcmp_avx2_movbe ()
    at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:413
#1  0x00007d7cc154ea0c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x000056c8ff2b382f in std::operator< <char, std::char_traits<char>, std::allocator<char> > (
    __lhs=<error: Cannot access memory at address 0x6c75725f79726575>, 
    __rhs="mysql1:3306") at /usr/include/c++/11/bits/basic_string.h:6343
#3  0x000056c8ff2abc1d in std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator() (this=0x7d7cc05a0cd8, 
    __x=<error: Cannot access memory at address 0x6c75725f79726575>, 
    __y="mysql1:3306") at /usr/include/c++/11/bits/stl_function.h:400
#4  0x000056c8ff7187f0 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> > >::_M_lower_bound (
    this=0x7d7cc05a0cd8, __x=0x7d7cb76b25a0, __y=0x7d7cb76b23c0, 
    __k="mysql1:3306") at /usr/include/c++/11/bits/stl_tree.h:1905
#5  0x000056c8ff70c806 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> > >::find (this=0x7d7cc05a0cd8, 
    __k="mysql1:3306") at /usr/include/c++/11/bits/stl_tree.h:2523
#6  0x000056c8ff7010f7 in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, MyGR_monitor_node*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> > >::find (this=0x7d7cc05a0cd8, 
    __x="mysql1:3306") at /usr/include/c++/11/bits/stl_map.h:1170
#7  0x000056c8ff6c214f in gr_update_hosts_map (start_time=86789911026, 
    gr_srv_st=..., mmsd=0x7d7cb646e500) at MySQL_Monitor.cpp:3745
#8  0x000056c8ff6c3b1e in async_gr_mon_actions_handler (mmsd=0x7d7cb646e500)
    at MySQL_Monitor.cpp:3970
#9  0x000056c8ff6c49a6 in monitor_GR_thread_HG (arg=0x7d7cb95e7048)
    at MySQL_Monitor.cpp:4096
#10 0x00007d7cc1094ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#11 0x00007d7cc1126850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 
quit
[2024-05-08 15:08:52] >>> Compressing 'test/cluster/node07/core.1355228' ...
[2024-05-08 15:08:56] >>> WARN - Core file found 'src/core.1354957' ...
[2024-05-08 15:08:56] >>> ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from './proxysql --clickhouse-server --sqlite3-server --idle-threads -f -c /var/lib/j', real uid: 112, effective uid: 112, real gid: 120, effective gid: 120, execfn: './proxysql', platform: 'x86_64'
44	./nptl/pthread_kill.c: No such file or directory.
The program is not being run.
[2024-05-08 15:08:56] >>> Reading symbols from ./proxysql...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./proxysql --clickhouse-server --sqlite3-server --idle-threads -f -c /var/lib/j'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=129351770711616)
    at ./nptl/pthread_kill.c:44
(gdb) (gdb) #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=129351770711616)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=129351770711616)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=129351770711616, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x000075a527042476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x000075a5270287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000075a52702871b in __assert_fail_base (
    fmt=0x75a5271dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x57c5dc3eb60c "prevflags != -1", 
    file=0x57c5dc3e9b12 "MySQL_Thread.cpp", line=2902, 
    function=<optimized out>) at ./assert/assert.c:92
#6  0x000075a527039e96 in __GI___assert_fail (
    assertion=0x57c5dc3eb60c "prevflags != -1", 
    file=0x57c5dc3e9b12 "MySQL_Thread.cpp", line=2902, 
    function=0x57c5dc3eb5c0 "MySQL_Session* MySQL_Thread::create_new_session_and_client_data_stream(int)") at ./assert/assert.c:101
#7  0x000057c5db8cc6c1 in MySQL_Thread::create_new_session_and_client_data_stream (this=0x75a523e0c000, _fd=7) at MySQL_Thread.cpp:2902
#8  0x000057c5db9e3397 in child_mysql (arg=0x75a5254783f0)
    at ProxySQL_Admin.cpp:5539
#9  0x000075a527094ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#10 0x000075a527126850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 
quit
[2024-05-08 15:08:58] >>> Compressing 'src/core.1354957' ...
@mirostauder mirostauder changed the title crashes during cluster shutdown in testing crashes during cluster shutdown in CI testing May 8, 2024
@JavierJF
Copy link
Collaborator

There were mitigations introduced for the first two types of crashes in this PR. There are no mitigations yet for the assert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants