One of our Asterisk telephony machines appeared to "leak" queue member agents. That is, refuse to ring them because they were supposedly busy.

When trying to find the cause, there weren't any data dumping functions for the container I wanted to inspect in the CLI. In this case the pending_members which is of type struct ao2_container. So, we had to resort to using gdb to inspect the data.

The struct ao2_container container data type itself looks like this:

(gdb) ptype struct ao2_container
type = struct ao2_container {
    ao2_hash_fn *hash_fn;
    ao2_callback_fn *cmp_fn;
    int n_buckets;
    int elements;
    int version;
    struct bucket buckets[];
}

Those buckets in turn contain linked lists with struct astobj2s which hold user_data. In this case of type struct member:

(gdb) ptype struct member
type = struct member {
    char interface[80];
...
    char state_interface[80];
...
    int status;
...
    struct call_queue *lastqueue;
...
}

First thing I did, was get a core dump of the running daemon. The machine had been taken out of the pool, so the 2 second delay when doing a dump was not a problem:

# gdb -p $(pidof asterisk) -batch -ex generate-core-file
Saved corefile core.17658

Next, examine the pending_members:

(gdb) print pending_members
$1 = (struct ao2_container *) 0x17837c8

(gdb) print *pending_members
$2 = {hash_fn = 0x7f294a63e4f0 <pending_members_hash>,
      cmp_fn = 0x7f294a638160 <pending_members_cmp>, n_buckets = 353,
      elements = 479, version = 51362, buckets = 0x17837e8}

We can check the individual elements:

(gdb) print pending_members->buckets[0]
$3 = {first = 0x7f285df57270, last = 0x7f285df57270}

(gdb) print *pending_members->buckets[0]->first
$4 = {entry = {next = 0x0}, version = 6626, astobj = 0x7f285ef154e8}

(gdb) print *pending_members->buckets[0]->first.astobj
$5 = {priv_data = {ref_counter = 2, destructor_fn = 0x0, data_size = 544,
      options = 0, magic = 2775626019}, user_data = 0x7f285ef15508}

And we can get to the underlying data, because we know what type it's supposed to have:

(gdb) print *pending_members->buckets[0]->first.astobj.user_data
$6 = (void *) 0x44492f6c61636f4c

(gdb) print *(struct member*)pending_members->buckets[0]->first.astobj.user_data
$7 = {interface = "Local/xxx@xxx", '\000' <repeats 40 times>,
      state_exten = '\000' <repeats 79 times>,
      state_context = '\000' <repeats 79 times>,
      state_interface = "SIP/xxx", '\000' <repeats 66 times>,
      membername = "Local/xxx@xxx", '\000' <repeats 40 times>,
      penalty = 0, calls = 0, dynamic = 0, realtime = 1,
      status = 0, paused = 0, queuepos = 1, lastcall = 0,
      in_call = 0, lastqueue = 0x0, dead = 0, delme = 0,
      rt_uniqueid = "242301", '\000' <repeats 73 times>, ringinuse = 0}

However, looping over a hash table of close to 500 elements to find the contents, is not feasible.

Enter python integration in gdb.

As documented here and here you can call python scripts from gdb. Which in turn can inspect the gdb data for you.

For instance:

(gdb) python print 'abc ' * 5
abc abc abc abc abc

(gdb) python
>print gdb.parse_and_eval(
>  '((struct member*)pending_members->buckets[0]'
>  '->first.astobj.user_data)->state_interface').string()
>^D
SIP/xxx

Good. Access from python. To import python code from a file, use: source my_file.py.

To find the members I was interested in, I hacked up the following little python script in three parts.

First, a few helpers:

from __future__ import print_function  # py2 and py3 compatibility

class simple_struct(object):
    def __init__(self, nameaddr):
        self._nameaddr = nameaddr
        self._cache = {}

    def __getattr__(self, name):
        if name not in self._cache:
            self._cache[name] = gdb.parse_and_eval(
                self._nameaddr + '->' + name)
        return self._cache[name]

    def __str__(self):
        return self._nameaddr

class simple_type(object):
    def __init__(self, datatype_name='void*'):
        self._datatype_name = datatype_name

    def __call__(self, nameaddr):
        return simple_struct(
            '((' + self._datatype_name + ')' + str(nameaddr) + ')')

Then, a reusable class to handle the ao2_container semantics:

ast_bucket_entry = simple_type('struct bucket_entry*')

class ast_ao2_container(object):
    def __init__(self, nameaddr):
        self._nameaddr = '((struct ao2_container*)' + nameaddr + ')'
        self.n_buckets = int(self.get(self.var('->n_buckets')))
        self.elements = int(self.get(self.var('->elements')))

    def get(self, name):
        return gdb.parse_and_eval(name)

    def var(self, tail='', head=''):
        return head + self._nameaddr + tail

    def var_bucket(self, idx, tail='', head=''):
        return self.var(head + '->buckets[' + str(idx) + ']' + tail)

    def foreach(self, func):
        found = 0
        for idx in range(0, self.n_buckets):
            first = self.get(self.var_bucket(idx, '->first'))
            if not first:
                continue

            found += self.foreach_in_bucket(func, idx, first)

        if found != self.elements:
            raise ValueError('found {} elements, expected {}'.format(
                found, self.elements))

    def foreach_in_bucket(self, func, idx, nextaddr):
        pos = 0
        while True:
            bucket = ast_bucket_entry(nextaddr)
            userdata = str(bucket.__getattr__('astobj.user_data'))

            func(userdata, idx, pos)
            pos += 1

            nextaddr = bucket.__getattr__('entry.next')
            if not nextaddr:
                break

        return pos

Lastly, my search/print of the member structs I was interested in:

app_queue__member = simple_type('struct member*')
app_queue__call_queue = simple_type('struct call_queue*')

def my_print_bucket_member(bucket, pos, member):
    print(bucket, pos, member)
    print('  state_interface =', member.state_interface.string())
    print('  interface =', member.interface.string())
    print('  queuepos =', int(member.queuepos))
    if member.lastqueue:
        lastqueue = app_queue__call_queue(member.lastqueue)
        print('  lastqueue =', lastqueue.name)
    print()

def my_find_all(nameaddr, bucket, pos):
    member = app_queue__member(nameaddr)
    my_print_bucket_member(bucket, pos, member)

def my_find_state_interface(nameaddr, bucket, pos):
    member = app_queue__member(nameaddr)
    state_interface = member.state_interface.string()
    if state_interface.startswith('SIP/some-account'):
        my_print_bucket_member(bucket, pos, member)

pending_members = ast_ao2_container('pending_members')
#pending_members.foreach(my_find_all)
pending_members.foreach(my_find_state_interface)

I was trying to find all members with state_interface starting with SIP/some-account. And like I suspected, they turned out to exist in the container. At bucket 80 as second element, and at bucket 225 as third element.

(gdb) source my_pending_members.py
80 1 ((struct member*)0x7f28c76adc88)
  state_interface = SIP/some-account-3
  interface = Local/xxx-3@context
  queuepos = 0
  lastqueue = 0x7f28c7820e32 "some-queue"

225 2 ((struct member*)0x7f28c6ce76f8)
  state_interface = SIP/some-account-6
  interface = Local/IDxxx-6@context
  queuepos = 2
  lastqueue = 0x7f28c7820e32 "some-queue"

Looking for those by hand would've been hopelessly tedious.

Now, continuing the investigation from gdb is easy. The second element of bucket 80 is indeed member 0x7f28c76adc88.

(gdb) print (struct member*)pending_members->buckets[80]\
->first.entry.next.astobj.user_data
$8 = (struct member *) 0x7f28c76adc88

(gdb) print *(struct member *) 0x7f28c76adc88
$9 = {interface = "Local/xxx-3@context", '\000' <repeats 40 times>,
      state_exten = '\000' <repeats 79 times>,
      state_context = '\000' <repeats 79 times>,
      state_interface = "SIP/some-account-3", '\000' <repeats 66 times>,
      membername = "Local/xxx-3@context", '\000' <repeats 40 times>,
      penalty = 0, calls = 2, dynamic = 0, realtime = 1, status = 0, paused = 0,
      queuepos = 0, lastcall = 1498069849, in_call = 0,
      lastqueue = 0x7f28c68f7678, dead = 0, delme = 0,
      rt_uniqueid = "48441", '\000' <repeats 74 times>, ringinuse = 0}

Nice going gdb! I think I can get used to this.

debug python gdb asterisk