cumulus / postfix in the right vrf

cumulus / postfix in the right vrf

  • Written by
    Walter Doekes
  • Published on

Cumulus Linux is a network operating system. It is a switch, but it also runs Linux OS, allowing us to run our automation tools on it. We use it to automate the configuration of our network. A network where we use VRF (virtual routing and forwarding) to separate customer traffic. The presence of VRFs in the OS however means that we have to tell the daemons in which VRF to run. And sometimes it needs some tweaks, like in the case of postfix.

How the specify the VRF

You can ssh right into the Cumulus switch. And there you can use your regular tools, like ping and curl. But, if you want to end up in the right network, you have to tell the tools where.

These examples are on a Cumulus Linux 3.7:

# net show vrf

VRF              Table
---------------- -----
CUSTOMERX         1001
CUSTOMERY         1002
mgmt              1020
...

If you want to ping to an IP in the CUSTOMERX network, you specify so:

# ping -I CUSTOMERX -c 1 -w 1 10.20.30.40
ping: Warning: source address might be selected on device other than mgmt.
PING 10.20.30.40 (10.20.30.40) from 10.5.83.22 CUSTOMERX: 56(84) bytes of data.
64 bytes from 10.20.30.40: icmp_seq=1 ttl=60 time=0.522 ms

If you specify no VRF or the wrong one, you’ll get no reply. If you want to run applications or services in the management VRF, you have to specify mgmt. This is likely the only place where you have direct access to internet.

If you log in, you’ll get the VRF from where you connected attached to your shell:

# ip vrf identify $$
mgmt

This makes sense, as the ssh daemon you’re connected to, is also in that VRF:

# ip vrf identify $(pidof sshd | tr ' ' '\n' | head -n1)
mgmt

Applications without native VRF support

You may have noticed that for ping you can specify a VRF using -I interface. Not all applications support that. For those applications, you can run the command prefixed by a call to ip vrf:

# nc 10.20.30.40 22 -v
10.20.30.40: inverse host lookup failed: Unknown host
^C
# ip vrf exec CUSTOMERX nc 10.20.30.40 22
SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
^C

Starting daemons in the right VRF

Cumulus Linux 3, based on Debian Jessie, uses systemd as init system. Pid 1 will be spawning the daemons, and that means that they won’t start in the management VRF by default.

# ip vrf identify 1
(void)

They have made a nifty little systemd-generator that fixes so you can run services an appropriate VRF. For instance your ntp time daemon, which needs access to the internet:

cat /etc/vrf/systemd.conf

# Systemd-based services that are expected to be run in a VRF context.
#
# If changes are made to this file run systemctl daemon-reload
# to re-generate systemd files.
...
ntp
...

systemctl cat ntp@mgmt.service

# /etc/systemd/system/ntp@.service
# created by vrf generator
...

[Service]
Type=simple
ExecStart=/usr/sbin/ntpd -n -u ntp:ntp -g
Restart=on-failure
...

# /run/systemd/generator/ntp@.service.d/vrf.conf
# created by vrf generator
...

[Service]
ExecStart=
ExecStart=/bin/ip vrf exec %i /usr/sbin/ntpd -n -u ntp:ntp -g

As you can see, the ExecStart is prefixed with an ip vrf exec mgmt. So, instead of starting/enabling ntp.service, you start/enable ntp@mgmt.service so it the time daemon runs in the expected VRF.

Getting postfix in the right VRF

The postfix (mailer) init script in this particular distribution has an annoying quirk: it depends on itself.

cat /etc/init.d/postfix

#!/bin/sh -e

### BEGIN INIT INFO
# Provides:          postfix mail-transport-agent
# Required-Start:    $local_fs $remote_fs $syslog $named $network $time
# Required-Stop:     $local_fs $remote_fs $syslog $named $network
# Should-Start:      postgresql mysql clamav-daemon postgrey spamassassin saslauthd dovecot
# Should-Stop:       postgresql mysql clamav-daemon postgrey spamassassin saslauthd dovecot
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Postfix Mail Transport Agent
# Description:       postfix is a Mail Transport agent
### END INIT INFO

...

The systemd-sysv-generator parses this, and generates this:

systemctl cat postfix.service

# /run/systemd/generator.late/postfix.service
# Automatically generated by systemd-sysv-generator

[Unit]
...
Before=mail-transport-agent.target shutdown.target
After=local-fs.target remote-fs.target ...
Wants=mail-transport-agent.target network-online.target
...

[Service]
...
ExecStart=/etc/init.d/postfix start
ExecStop=/etc/init.d/postfix stop
ExecReload=/etc/init.d/postfix reload

# /run/systemd/generator/postfix.service.d/50-postfix-$mail-transport-agent.conf
# Automatically generated by systemd-insserv-generator

[Unit]
Wants=mail-transport-agent.target
Before=mail-transport-agent.target

systemctl cat mail-transport-agent.target

# /lib/systemd/system/mail-transport-agent.target
...

# /run/systemd/generator/mail-transport-agent.target.d/50-hard-dependency-postfix-$mail-transport-agent.conf
# Automatically generated by systemd-insserv-generator

[Unit]
SourcePath=/etc/insserv.conf.d/postfix
Requires=postfix.service

That is, postfix.service provides mail-transport-agent.target (the last snippet), but it also depends on it; through Wants and Before options.

The Cumulus systemd VRF generator in turn generates this:

systemctl cat postfix@mgmt.service

# /etc/systemd/system/postfix@.service
# created by vrf generator
# Automatically generated by systemd-sysv-generator

[Unit]
...
Before=mail-transport-agent.target shutdown.target
After=local-fs.target remote-fs.target ...
Wants=mail-transport-agent.target network-online.target
...

[Service]
Environment=_SYSTEMCTL_SKIP_REDIRECT=true
...
ExecStart=/bin/ip vrf exec %i /etc/init.d/postfix start
ExecStop=/bin/ip vrf exec %i /etc/init.d/postfix stop
ExecReload=/bin/ip vrf exec %i /etc/init.d/postfix reload
...

Unfortunately, this means that postfix@mgmt.service now depends on postfix.service (the version with an unspecified VRF). And that it is likely that starting postfix.service will cause postfix@mgmt.service to fail — because the competing postfix is already running:

# LC_ALL=C systemctl list-dependencies postfix@mgmt.service |
    grep -E 'postfix|mail-transport'
postfix@mgmt.service
* |-system-postfix.slice
* |-mail-transport-agent.target
* | `-postfix.service

Depending on luck or your configuration, you might get the right process started, but also see the other process as failed:

# LC_ALL=C systemctl list-units --failed
  UNIT                                 LOAD   ACTIVE SUB    DESCRIPTION
* postfix.service                      loaded failed failed LSB: Postfix Mail Transport Agent

But we don’t want to fix this with luck. Fix it by ensuring that the non-VRF postfix startup never causes conflicts:

cat /etc/systemd/system/postfix.service.d/ignore.conf

[Service]
# Ensure dependencies on this do not conflict with the proper
# postfix@mgmt.service:
ExecStart=
ExecStop=
ExecReload=

ExecStart=/bin/true

With that in place, postfix now starts smoothly in the right VRF. And a test-mail should arrive smoothly:

# FROM=test@example.com && TO=yourself@example.com &&
  printf 'Subject: test\r\nDate: %s\r\nFrom: %s\r\nTo: %s\r\n\r\ntest\r\n' \
    "$(date -R)" "$FROM" "$TO" | /usr/sbin/sendmail -f "$FROM" "$TO"

Back to overview Newer post: zfs destroy / dataset is busy Older post: offsite / on-the-fly encrypted backups / gocryptfs