Last month, our e-mail exchange (Postfix) started having trouble delivering mail to certain destinations. These destinations all appeared to be using Microsoft Office 365 for their e-mail. What was wrong? Who was to blame? And how to fix it?

The problem appeared like this:

Nov 16 17:04:08 mail postfix/smtp[13330]: warning: no MX host for umcg.nl has a
  valid address record
Nov 16 17:04:08 mail postfix/smtp[13330]: 1D1D21422C2: to=<-EMAIL-@umcg.nl>,
  relay=none, delay=2257, delays=2256/0.02/0.52/0, dsn=4.4.3, status=deferred
  (Host or domain name not found. Name service error for
  name=umcg-nl.mail.protection.outlook.com type=A: Host not found, try again)

If we looked up that domain normally, we'd get a result:

$ host umcg-nl.mail.protection.outlook.com
umcg-nl.mail.protection.outlook.com has address 213.199.154.23
umcg-nl.mail.protection.outlook.com has address 213.199.154.87

But if Postfix did a lookup, it failed with SERVFAIL. And interestingly, after the failed lookup from Postfix this failure response was cached in the DNS recursor.

It turned out that Postfix did a lookup with EDNS + DNSSEC options because of the default smtp_tls_dane_insecure_mx_policy=dane setting.

Extension mechanisms for DNS (EDNS) is used for DNSSEC. The security aware resolver — in this case Postfix — sets the EDNS0 OPT "DNSSEC OK" (DO) flag in the request to indicate that it wants to know whether the domain was secured by DNSSEC or not.

Note that the DNS recursor should do DNSSEC validation always if possible anyway — and discard the result if validation fails — but the DO-flag indicates that the caller wants to know whether it was secured at all. Postfix uses the outcome to determine whether the destination path could be forged by bad DNS entries and it updates its security position for that path accordingly.

This EDNS lookup is not new. It works on almost all DNS servers, except on older DNS implementations, like the ones Microsoft uses for Office 365.

$ dig A umcg-nl.mail.protection.outlook.com.  \
       @ns1-proddns.glbdns.o365filtering.com. +edns +dnssec | grep FORMERR
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 46904
;; WARNING: EDNS query returned status FORMERR - retry with '+nodnssec +noedns'

Because EDNS is an extension, DNS servers are not obligated to respond sensibly to that. A FORMERR response is okay. Your local DNS recursor should parse that response and retry.

And that is where the alpha2 version of PowerDNS recursor on Ubuntu/Xenial went afoul. It did not do a second lookup. Instead, it returned SERVFAIL, and cached that response.

It had been fixed already in 9d534f2, but that fix had not been applied to the Ubuntu LTS build yet.

Download the patch to the deb-package for Ubuntu/Xenial.

Patch instructions:

$ apt-get source pdns-recursor
$ patch -p0 < pdns-recursor_4.0.0~alpha2-2--osso1.patch
$ cd pdns-recursor-4.0.0~alpha2
$ DEB_BUILD_OPTIONS="parallel=15" dpkg-buildpackage -us -uc -sa

Relevant reading:
EDNS DANE trouble with Microsoft mail-protection-outlook-com;
pdns-recursor 4.0.0\~alpha2-2 fails on FORMERR response to EDNS query Edit.

powerdns patch xenial recursor ubuntu