curious termios error / switching to asyncio serial

curious termios error / switching to asyncio serial

  • Written by
    Walter Doekes
  • Published on

My Python code that interfaces with a serial port stopped working when refactoring the code to use asyncio. It started raising Invalid argument exceptions from tcsetattr(3). Why would asynchronous Python misbehave? Was there a bug in serial_asyncio?

TL;DR: When interfacing with an openpty(3) pseudoterminal — which I used to emulate a real serial port — setting parity and bytesize is not supported. But an error would only show up when tcsetattr(3) was called twice, which happened only in the asyncio case.

I was working on some Python code to communicate with an IEC62056-21 device over a serial bus. I planned on using the UART of a Raspberry PI for this purpose.

The basics involve opening a serial.Serial('/dev/ttyACM0', baudrate) Python serial object, reading from it and writing to it. But, for serial communication, there is more than just the baud rate which needs to be agreed upon. For this IEC62056-21 device, it is:

  • baud rate 300 (initially),
  • 1 start bit (common default),
  • 7 bits per byte (only 7 bits ASCII is spoken),
  • 1 even parity bit (making sure all eight bits add up to a zero),
  • 1 stop bit (common default).

The Python serial object supports all that:

ser = serial.Serial(
    '/dev/ttyACM0', baudrate=300,
    parity=serial.PARITY_EVEN,
    bytesize=7)

That should work, and it does. But...

Testing serial code using an openpty pair

Developing against the real device is awkward and fragile. So, instead of developing against the hardware device directly, I decided it was worth the effort to emulate the IEC62056-21 device, a fake server if you will.

For this purpose, I needed a bridge between two serial devices. This can be done using socat:

$ socat -dd pty,rawer,link=server.dev pty,rawer,link=client.dev
2022/01/22 12:59:34 socat[700243] N PTY is /dev/pts/28
2022/01/22 12:59:34 socat[700243] N PTY is /dev/pts/29
2022/01/22 12:59:34 socat[700243] N starting data transfer loop with FDs [5,5] and [7,7]

socat spawns a bridge between two openpty(3) created pseudoterminals and creates two symlinks for easy access:

$ readlink client.dev server.dev
/dev/pts/29
/dev/pts/28

We can now connect our fake server to one end, and the client we're developing to the other.

(In my case, I'm using a custom serialproxy.py that does what socat does, and more, because it also simulates baud rate slowness and checks speed compatibility, relevant because the IEC62056-21 protocol uses negotiable/changing transfer rates.)

A simple test script connecting to both the server and client end might look as follows:

# For this example, run the following in
# another terminal (in the same directory):
# $ socat -dd pty,rawer,link=server.dev pty,rawer,link=client.dev
# WARNING: Restart this socat to reset bridge state.

import serial
import time

SERIAL_OPTIONS = {
    'baudrate': 300,
    'parity': serial.PARITY_EVEN,
    'bytesize': 7,
    'stopbits': 1,
}

def main_serial(write_dev, read_dev):
    reader = serial.Serial(
        read_dev, **SERIAL_OPTIONS)
    #reader.timeout = 0
    writer = serial.Serial(
        write_dev, **SERIAL_OPTIONS)

    messages = [
        b'foo\n', b'bar\n', b'baz\n']
    for msg in messages:
        writer.write(msg)
        print('send', msg)
        out = reader.read(4)
        print('recv', out)
        assert msg == out, (msg, out)
        time.sleep(1)

main_serial('server.dev', 'client.dev')

This example prints:

send b'foo\n'
recv b'foo\n'
send b'bar\n'
recv b'bar\n'
send b'baz\n'
recv b'baz\n'

That's nice. We can use serial.Serial() without requiring a hardware UART during development.

(However, if you run the same script again without restarting socat, you'll be greeted with an Invalid argument exception. We'll see why in a bit. Restarting socat resets the bridge state and makes the script work again.)

Testing asyncio serial code using an openpty pair

Because it is 2022, and I don't want to use threads in Python (they're a poor combo), and I wanted to be able to run multiple subsystems at once, I decided to convert this script to use Python Asynchronous I/O.

Surely converting this script to asyncio is easy:

# For this example, run the following in
# another terminal (in the same directory):
# $ socat -dd pty,rawer,link=server.dev pty,rawer,link=client.dev
# WARNING: Restart this socat to reset bridge state.

import asyncio
import serial
from serial_asyncio import open_serial_connection

SERIAL_OPTIONS = {
    'baudrate': 300,
    'parity': serial.PARITY_EVEN,
    'bytesize': 7,
    'stopbits': 1,
}

def main_aioserial(write_dev, read_dev):
    loop = asyncio.get_event_loop()
    main_coro = aioserial(write_dev, read_dev)
    loop.run_until_complete(main_coro)
    loop.close()

async def aioserial(write_dev, read_dev):
    async def send(ser, msgs):
        for msg in msgs:
            ser.write(msg)
            print('send (asyncio)', msg)
            await asyncio.sleep(1)

    async def recv(ser):
        for i in range(3):
            out = await ser.readuntil(b'\n')
            print('recv (asyncio)', out)

    reader, _ = await open_serial_connection(
        url=read_dev, **SERIAL_OPTIONS)
    _, writer = await open_serial_connection(
        url=write_dev, **SERIAL_OPTIONS)

    messages = [b'foo\n', b'bar\n', b'baz\n']
    send_coro = send(writer, messages)
    recv_coro = recv(reader)
    await asyncio.wait({
        asyncio.create_task(coro)
        for coro in (send_coro, recv_coro)})

main_aioserial('server.dev', 'client.dev')

This should print the following:

send (asyncio) b'foo\n'
recv (asyncio) b'foo\n'
send (asyncio) b'bar\n'
recv (asyncio) b'bar\n'
send (asyncio) b'baz\n'
recv (asyncio) b'baz\n'

But instead, it prints:

Traceback (most recent call last):
  File "test_aioserial.py", line 45, in <module>
    main_aioserial('server.dev', 'client.dev')
  File "test_aioserial.py", line 20, in main_aioserial
    loop.run_until_complete(coro)
  File "asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "test_aioserial.py", line 35, in aioserial
    reader, _ = await open_serial_connection(
  File "serial_asyncio/__init__.py", line 438, in open_serial_connection
    transport, _ = yield from create_serial_connection(
  File "asyncio/coroutines.py", line 124, in coro
    res = func(*args, **kw)
  File "serial_asyncio/__init__.py", line 412, in create_serial_connection
    transport = SerialTransport(loop, protocol, ser)
  File "serial_asyncio/__init__.py", line 63, in __init__
    self._serial.timeout = 0
  File "serial/serialutil.py", line 368, in timeout
    self._reconfigure_port()
  File "serial/serialposix.py", line 435, in _reconfigure_port
    termios.tcsetattr(
termios.error: (22, 'Invalid argument')

This had me stumped for quite some time. Why would serial_asyncio misbehave?

Debugging Python tcsetattr

I decided to add some tracing code in /usr/lib/python3/dist-packages/serial/serialposix.py:

...
        # activate settings
        if force_update or [iflag, oflag, cflag, lflag, ispeed, ospeed, cc] != orig_attr:
            # vvv-- one line of code added --vvv
            import os; os.write(2, b'DEBUG serialposix.py tcsetattr() 0o%o\n' % (cflag,))
            termios.tcsetattr(
                self.fd,
                termios.TCSANOW,
                [iflag, oflag, cflag, lflag, ispeed, ospeed, cc])
...

Running the synchronous code snippet now showed:

$ python3 test_serial.py
DEBUG serialposix.py tcsetattr() 0o4657
DEBUG serialposix.py tcsetattr() 0o4657
send b'foo\n'
...

That is, two calls because of the two Serial objects being set up.

The output of the asynchronous version was now:

$ python3 test_aioserial.py
DEBUG serialposix.py tcsetattr() 0o4657
DEBUG serialposix.py tcsetattr() 0o4647
Traceback (most recent call last):
...
termios.error: (22, 'Invalid argument')

Here, the second call to tcsetattr(3) is done when setting the timeout to 0 (to make it "non-blocking"). It's still setting up the first of the two Serial objects.

If we look at the first example, we can enable the reader.timeout = 0 line and notice how the synchronous version now also chokes immediately.

So, it's not a serial_asyncio problem. But it is still a problem.

An strace then? Calling strace will show us which system calls are being executed. Maybe insight in their failure can shed some light on this. After all, the tcsetattr(3) library call will get translated to one or more system calls:

$ strace -s 2048 python3 test_serial.py 2>&1 |
  grep -A15 'write(.*DEBUG serialposix'

We know the call comes right after our write(2).

write(2, "DEBUG serialposix.py tcsetattr() 0o4657\n", 40DEBUG serialposix.py tcsetattr() 0o4657
) = 40

There it is. And then immediately some ioctl(2) calls.

ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(3, SNDCTL_TMR_START or TCSETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TIOCMBIS, [TIOCM_DTR])         = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(3, TCFLSH, TCIFLUSH)              = 0

There was one error, but this is during the first call, so it is ignored. (I can tell you now that those first four ioctl calls are all caused by a single Python termios.tcsetattr() call.)

pipe2([4, 5], O_CLOEXEC)                = 0
pipe2([6, 7], O_CLOEXEC)                = 0
fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0

Some other code was called, and then, the second call to tcsetattr(3):

write(2, "DEBUG serialposix.py tcsetattr() 0o4647\n", 40DEBUG serialposix.py tcsetattr() 0o4647
) = 40
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, SNDCTL_TMR_START or TCSETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0

And after this point in the trace, Python starts collecting values to show a backtrace.

Now that is unexpected. The last system call returned 0 (SUCCESS), yet Python considered it failed and reports "Invalid argument".

I'm certain Python is not making that up, so there must be some other explanation.

Checking the library sources

The CPython library tcsetattr function looks like:

static PyObject *
termios_tcsetattr_impl(PyObject *module, int fd, int when, PyObject *term)
...
    if (cfsetispeed(&mode, (speed_t) ispeed) == -1)
        return PyErr_SetFromErrno(state->TermiosError);
    if (cfsetospeed(&mode, (speed_t) ospeed) == -1)
        return PyErr_SetFromErrno(state->TermiosError);
    if (tcsetattr(fd, when, &mode) == -1)
        return PyErr_SetFromErrno(state->TermiosError);
...

Pretty straight forward.

The CPython function calls the GNU glibc tcsettatr library function, which looks like:

int
__tcsetattr (int fd, int optional_actions, const struct termios *termios_p)
...
  return INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
}

One syscall. Only one! But why is Python turning this into an exception then?

The answer turned out to be in local-tcsetaddr.diff — a distro-specific patch deployed on Debian/Linux distributions and derivatives (like Ubuntu):

# All lines beginning with `# DP:' are a description of the patch.
# DP: Description: tcsetattr sanity check on PARENB/CREAD/CSIZE for ptys
# DP: Related bugs: 218131.
...
--- a/sysdeps/unix/sysv/linux/tcsetattr.c
+++ b/sysdeps/unix/sysv/linux/tcsetattr.c
@@ -75,7 +80,55 @@
   memcpy (&k_termios.c_cc[0], &termios_p->c_cc[0],
    __KERNEL_NCCS * sizeof (cc_t));

-  return INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
+  retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
+
+  /* The Linux kernel silently ignores the invalid c_cflag on pty.
+     We have to check it here, and return an error.  But if some other
+     setting was successfully changed, POSIX requires us to report
+     success. */
...
+   /* It looks like the Linux kernel silently changed the
+      PARENB/CREAD/CSIZE bits in c_cflag. Report it as an
+      error. */
+   __set_errno (EINVAL);
+   retval = -1;
+ }
+    }
+   return retval;
 }

According to that patch, the relevant debian bug report 218131, and other relevant communication, setting invalid/unsupported c_flags on a pseudoterminal (pty) should return EINVAL. Linux apparently just ignores the bad c_flags.

POSIX tcsetattr needs to handle three conditions correctly:

  • If all changes are successful, return success (0).
  • If some changes are successful and some aren't, return success.
  • If no changes are successful, return error (-1, errno=EINVAL).

The problem occurs when setting certain flags (PARENB, CREAD, or one of the CSIZE parameters) on a pty. The kernel silently ignores those settings, so libc is responsible for doing the right thing.

And indeed, if we remove the PARENB (enable even parity) and CS7 (7 bit CSIZE), the Invalid argument error goes away:

SERIAL_OPTIONS = {
    'baudrate': 300,
    #DISABLED# 'parity': serial.PARITY_EVEN,
    #DISABLED# 'bytesize': 7,
    'stopbits': 1,
}
$ python3 test_serial.py
DEBUG serialposix.py tcsetattr() 0o4267
DEBUG serialposix.py tcsetattr() 0o4277
send b'foo\n'
...

And now we can run the script consecutive times without restarting socat.

Setting parity or bytesize on a pseudoterminal is a big no

Apparently, pseudoterminals, as opened by openpty(3), do not grok setting different parity or bytesize. This was silently ignored by the tcsetattr(3) library call, as long as something else was altered at the same time.

Setting the baud rate works fine, it is propagated across to the slave pty. But while testing we'll need to avoid setting unsupported parity or bytesize.

After figuring all that out, going back to the manpage of tcsetattr(3) revealed this:

RETURN VALUE
  Note that tcsetattr() returns success
  if any of the requested changes could
  be successfully carried out. Therefore,
  when making multiple changes it may be
  necessary to follow this call with a
  further call to tcgetattr() to check
  that all changes have been performed
  successfully.

So, Python might be setting multiple values at once. And, maybe, just maybe, Python should then always call tcsetattr(3) twice, like the manpage suggests. That would cause consistent behaviour. And that would've saved me hours of debugging as I would've identified the custom SERIAL_OPTIONS as culprits so much sooner.


Back to overview Newer post: rst tables with htmldjango / emoji two columns wide Older post: recap 2021