Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when supervisorctl start ccrelay #448

Closed
pbaranovsky opened this issue Aug 16, 2022 · 11 comments
Closed

Segmentation fault when supervisorctl start ccrelay #448

pbaranovsky opened this issue Aug 16, 2022 · 11 comments

Comments

@pbaranovsky
Copy link

Installed latest version of ccrelay.
relay -v
carbon-c-relay v3.7.4 (d22cec-dirty)
enabled support for: gzip ssl
regular expressions library: PCRE

running on debian buster
cat /etc/debian_version
10.12

when supervisorctl start ccrelay :
Segmentation fault

in /var/log/messages
kernel: [2756009.582926] relay[2969]: segfault at 0 ip 00007fe44ef1f11e sp 00007ffd79c87bf8 error 4 in libc-2.28.so[7fe44edea000+147000]

I'm attempting to run this command:
/opt/ccrelay/bin/relay -f /opt/ccrelay/etc/ccrelay.cfg [-S 30 -b 50000 -w 18 -q 15000000 -p -H -d
[2022-08-16 16:18:10] starting carbon-c-relay v3.7.4 (d22cec-dirty), pid=4523
configuration:
relay hostname =
workers = 18
send batch size = 50000
server queue size = 15000000
server max stalls = 4
listen backlog = 32
server connection IO timeout = 600ms
idle connections disconnect timeout = 10m
debug = true
configuration = /opt/ccrelay/etc/ccrelay.cfg

Would appreciate any pointers on how to look into this further, or what the issue might be.

@grobian
Copy link
Owner

grobian commented Aug 16, 2022

Is this running in a container or some env that is memory constrained?

If you could get a backtrace somehow, then that would help. I'm affraid for that you would have to build from source if there are no debugsymbols available.

@grobian
Copy link
Owner

grobian commented Aug 16, 2022

To be precise, you never see parsed configuration follows: in the output, right? That restricts it somewhat. Does it also crash if you use a very minimal config or default options (basically -f conf)?

@pbaranovsky
Copy link
Author

pbaranovsky commented Aug 16, 2022

no @grobian this is running on a physical hardware. I've tried the latest version and also 3.4. Both I've compiled myself from source. There seems to be plentiful memory available on the server.
free -m
total used free shared buff/cache available
Mem: 128709 1855 122334 1009 4519 124902
Swap: 7812 0 7812

Configuration does get parsed:
relay -f /path/to/ccrelay.cfg [-b <> -w <> -q <> -p -H
[date] starting carbon-c-relay v3.7.4 (d22cec-dirty), pid=37304
configuration:
relay hostname =
workers =
send batch size = <..>
server queue size = <>
server max stalls = <>
listen backlog = <>
server connection IO timeout = 600ms
idle connections disconnect timeout = 10m
configuration = /opt/ccrelay/etc/ccrelay.cfg

Segmentation fault

Minimal configuration (just the -f) fails with segmentation fault as well

@grobian
Copy link
Owner

grobian commented Aug 16, 2022

Then if you could, please ./configure && make and run the config with gdb --args ./relay, type run after the (gdb) prompt and bt when it pops up for the segmentation fault. Paste the stack you get here, if possible. Thanks!

@pbaranovsky
Copy link
Author

pbaranovsky commented Aug 16, 2022

gdb --args ./relay -f /opt/ccrelay/etc/ccrelay.cfg
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
...
Reading symbols from ./relay...done.
(gdb) run
Starting program: ...relay -f /opt/ccrelay/etc/ccrelay.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[2022-08-16 19:59:38] starting carbon-c-relay v3.7.4 (d22cec-dirty), pid=39581
configuration:
relay hostname =
workers = 40
send batch size = 2500
server queue size = 25000
server max stalls = 4
listen backlog = 32
server connection IO timeout = 600ms
idle connections disconnect timeout = 10m
configuration = /opt/ccrelay/etc/ccrelay.cfg

Program received signal SIGSEGV, Segmentation fault.
__strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102 <<<<<<<
102 ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory. <<<<<<<<<
(gdb) bt
#0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
#1 0x000055555556e70f in server_cmp (s=, saddr=saddr@entry=0x5555555a1280, ip=ip@entry=0x0)
at server.c:1271
#2 0x0000555555565e7d in router_add_server (ret=ret@entry=0x5555555a31a0, ip=0x0, port=2003, inst=0x0,
type=T_LINEMODE, transport=W_PLAIN, mtlspemcert=0x0, mtlspemkey=0x0, proto=CON_TCP, saddrs=0x5555555a1280,
hint=0x0, useall=0 '\000', cl=0x7ffff7800060) at router.c:630
#3 0x00005555555617c2 in router_yyparse (yyscanner=0x5555555a4960, rtr=rtr@entry=0x5555555a31a0,
ralloc=0x5555555a2580, palloc=palloc@entry=0x55555559f360) at conffile.y:244
#4 0x000055555556a2af in router_readconfig (orig=0x0, path=0x7fffffffe841 "/opt/ccrelay/etc/ccrelay.cfg",
workercnt=, queuesize=, batchsize=, maxstalls=,
iotimeout=600, sockbufsize=0, listenport=) at router.c:1340
#5 0x000055555555937c in main (argc=3, argv=) at relay.c:885
(gdb)

@pbaranovsky
Copy link
Author

@grobian this was due to a malformed line in ccrelay.cfg as the
#4 0x000055555556a2af in router_readconfig (orig=0x0, path=0x7fffffffe841 "/opt/ccrelay/etc/ccrelay.cfg",
suggests.

@pbaranovsky
Copy link
Author

...and thank you very much for suggesting how to use gdb to narrow down the source!.

@grobian
Copy link
Owner

grobian commented Aug 17, 2022

Can you share your clusters from your config? The crash should be fixed, but I'm trying to see what you're doing :)

@pbaranovsky
Copy link
Author

one of the variables in ccrelay.cfg was not properly resolved
cluster cluster_name
any_of
:2003
;

@grobian
Copy link
Owner

grobian commented Aug 17, 2022

Aha, nice! And you wanted this to mean ADDR_ANY or something?

@pbaranovsky
Copy link
Author

there was supposed to be a variable containing fqdn's of nodes inserted there.

msaf1980 pushed a commit to msaf1980/carbon-c-relay that referenced this issue Feb 15, 2024
Throw an error when a cluster contains servers that are specified
without host.  That situation is OK for listeners, but not for
destinations, of course.

Closes: grobian#448
Signed-off-by: Fabian Groffen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants