I am trying to write a program, in C++, which runs on a cluster of machines, and all machines are talking to each other over TCP sockets. Program crashes randomly at one of the machines. I did an analysis of core-dump with gdb. Following are the output:
$ gdb executable dump
Core was generated by `/home/user/experiments/files/executable 2 /home/user/'.
Program terminated with signal SIGABRT, Aborted.
0 0x00007fb76a084c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
0 0x00007fb76a084c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
1 0x00007fb76a088028 in __GI_abort () at abort.c:89
2 0x00007fb76a0c12a4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fb76a1cd113 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
3 0x00007fb76a158bbc in __GI___fortify_fail (msg=<optimized out>, msg@entry=0x7fb76a1cd0aa "buffer overflow detected") at fortify_fail.c:38
4 0x00007fb76a157a90 in __GI___chk_fail () at chk_fail.c:28
5 0x00007fb76a158b07 in __fdelt_chk (d=<optimized out>) at fdelt_chk.c:25
6 0x000000000040a918 in LocalSenderPort::run() ()
7 0x000000000040ae70 in LocalSenderPort::LocalSenderPort(unsigned int, std::string, std::vector<std::string, std::allocator<std::string> >, char*) ()
8 0x00000000004033d5 in main ()
Any suggestions for what should I look? How should I proceed? Any help is really appreciated.
I am not sharing code right now, as its a large code spread across files. But I can share if needed.
This error:
__fdelt_chk (d=<optimized out>) at fdelt_chk.c:25
means that your program violated precondition of one of theFD_*
macros.The source of fdelt_chk is quite simple, and there are only two conditions under which it fails: you pass in negative file descriptor, or you pass in a file descriptor greater than 1023.
In this day and age, using
select
and/orFD_SET
in any program that can have more than 1024 simultaneous connections (which Linux easily allows) can only end in tears. Use epoll instead.