gdb / backtrace / running process

gdb / backtrace / running process

  • Written by
    Walter Doekes
  • Published on

Sometimes you want a backtrace or a core dump from a process that you do not want to stall. This could concern a multithreaded application of which some threads are still doing important work (like handling customer calls). Firing up gdb would halt the process for as long as you’re getting info, and raising a SIGABRT to get a core dump has the negative side-effect of killing the process. Neither is acceptable in a production environment.

In comes the handy gdb(1) option -ex. See this hanging.c example that we will examine while leaving it running.

int c() {
  while(1);
  return 64;
}
int b() {
  return c();
}
int a() {
  return b();
}
int main() {
  return a();
}

Fire it up, gather info, and keep running:

$ gcc hanging.c -o hanging -g
$ ./hanging &
[1] 787
$ time gdb -p `pidof hanging` -ex bt -ex 'thread apply all bt full' -ex detach -ex quit
...
c () at hanging.c:2
2   while(1);
#0  c () at hanging.c:2
#1  0x00000000004004d8 in b () at hanging.c:6
#2  0x00000000004004e8 in a () at hanging.c:9
#3  0x00000000004004f8 in main () at hanging.c:12

Thread 1 (process 787):
#0  c () at hanging.c:2
No locals.
#1  0x00000000004004d8 in b () at hanging.c:6
No locals.
#2  0x00000000004004e8 in a () at hanging.c:9
No locals.
#3  0x00000000004004f8 in main () at hanging.c:12
No locals.
Detaching from program: /home/walter/hanging, process 787

real  0m0.128s
user  0m0.120s
sys   0m0.020s
$ fg
./hanging

Obviously the process does hang while gdb gathers the required information, but it resumes immediately after, hopefully without your users noticing it.

You can write a core dump too, if you like. But this can require a bit more time, depending on how much memory your process is using.

# cat /proc/`pidof asterisk`/status | grep VmRSS
VmRSS:     13236 kB
# time gdb -p `pidof asterisk` -ex generate-core-file -ex detach -ex quit
...
0x00007f1f01e5ebd6 in poll () from /lib/libc.so.6
Saved corefile core.313
Detaching from program: /usr/local/sbin/asterisk, process 313

real  0m1.972s
user  0m0.192s
sys   0m0.332s
# ls -lh core.313
-rw-r--r-- 1 root root 15M 2011-09-06 08:43 core.313

A couple of notes about this last example:

  • RSS is only indicative of the dump size. The dump may very well turn out twice as large.
  • Most of the time spent here was loading symbols. A 30MB dump won’t take twice as long. An 800MB dump will take some time though. Beware.

Back to overview Newer post: backtrace / without debugger Older post: sip / six digit port number / invalid