asterisk / nat keepalive / round robin dns
Current Asterisk (telephony software) version
1.6.2.x (and probably 1.4 and 1.8), has an odd quirk with the
The qualify option enables a function that checks the response times of the SIP peer. By default, it sends an OPTIONS SIP packet every 60 seconds. The quirk here, is that it sends the packet to the first A-record resolved for this peers hostname at startup (or sip reload).
This works fine in most cases when the host has only one A-record. However, if you have a sip server hostname (sip.myprovider.com) with multiple A-records for load-balancing purposes (round robin), this can cause problems.
Contact registration (using the REGISTER SIP command) is done simultaneously, but less frequently than the qualify OPTIONS. For SIP registration the A-record chosen can change at random.
What this means, is that the OPTIONS will be sent to a single IP destination for the entire running life of the asterisk server, while the REGISTERs will end up at a random server.
What this also means is that the NAT-keepalive functionality of the qualify option breaks.
When you think about how Asterisk works, it is correct: the registration of contacts is decoupled from the peer definitions where the qualify options are set. But from a practical point of view, it’s bad because NAT-keepalive is broken.
A workaround: send UDP keepalives yourself (view). You don’t need a response. Just sending some NULs is enough.
$ ./udp-keepalive.py sip.myprovider.com sent '\x00\x00\x00\x00' to ('18.104.22.168', 5060) sent '\x00\x00\x00\x00' to ('22.214.171.124', 5060) sent '\x00\x00\x00\x00' to ('126.96.36.199', 5060) ... wait while ... sent '\x00\x00\x00\x00' to ('188.8.131.52', 5060) sent '\x00\x00\x00\x00' to ('184.108.40.206', 5060) sent '\x00\x00\x00\x00' to ('220.127.116.11', 5060) ...
By the way, the Linksys SPA series VoIP phones do NUL keepalives too — empty packets in their case — if you set the NAT Keep Alive Msg to empty instead of $NOTIFY, $OPTIONS or $PING.