Today we ran into a dovecot/imap crash on a Xenial box. The Dovecot in question was the patched dovecot-2.2.22.

Due to an as of yet unexplained cause, reading mail through Thunderbird mail client worked fine, but when opening a message with Roundcube (webmail), most messages would give an odd error about a "message that could not be opened".

An IMAP trace of Roundcube revealed that the IMAP server stopped responding after the client A0004 UID FETCH command. dmesg revealed that this was due to a segfault of the dovecot/imap binary.

# dmesg | tail -n1
[2405902.912457] imap[5745]: segfault at 0 ip 00007f7a56b251f1 sp 00007fff7f6e9fd0
  error 4 in libdovecot.so.0.0.0[7f7a56aa0000+f4000]
# addr2line -Cfe /usr/lib/dovecot/libdovecot.so \
    $(printf %x $((0x7f7a56b251f1 - 0x7f7a56aa0000)))
i_stream_seek
??:?

On our box, the imap server runs chrooted inside the users homedir. After adding LimitCORE=infinity to the dovecot.service file. The segfault produced a nice core dump inside the virtual mail dir, in our case located in /var/mail/DOMAIN/USER.

# gdb /usr/lib/dovecot/imap /var/mail/DOMAIN/USER/core
...
Core was generated by `dovecot/imap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  i_stream_seek (stream=0x0, v_offset=0) at istream.c:286
...

(gdb) bt
#0  i_stream_seek (stream=0x0, v_offset=0) at istream.c:286

  282   void i_stream_seek(struct istream *stream, uoff_t v_offset)
  ...
  286       if (v_offset >= stream->v_offset &&  // &stream->v_offset == 0x0
  287           i_stream_can_optimize_seek(_stream))

#1  0x00007f7a56b2a4c2 in i_stream_limit_read (stream=0x5632721f8270) at istream-limit.c:34

  27    static ssize_t i_stream_limit_read(struct istream_private *stream)
  ...
  34        i_stream_seek(stream->parent, lstream->istream.parent_start_offset +
  35                  stream->istream.v_offset);

#2  0x00007f7a56b252f3 in i_stream_read (stream=stream@entry=0x5632721f82e0) at istream.c:162
#3  0x00007f7a56b25d2d in i_stream_read_data (stream=0x5632721f82e0, data_r=data_r@entry=0x7fff7f6ea098, size_r=size_r@entry=0x7fff7f6ea0a0, threshold=threshold@entry=1)
    at istream.c:551
#4  0x00007f7a56b09920 in message_parse_header_next (ctx=0x5632721f8420, hdr_r=hdr_r@entry=0x7fff7f6ea0f0) at message-header-parser.c:82
#5  0x00007f7a56b0b37a in preparsed_parse_next_header (ctx=ctx@entry=0x5632721f7ea0, block_r=block_r@entry=0x7fff7f6ea1a0) at message-parser.c:938
#6  0x00007f7a56b0b518 in preparsed_parse_next_header_init (ctx=0x5632721f7ea0, block_r=0x7fff7f6ea1a0) at message-parser.c:987
...

(gdb) print stream.read
$1 = (ssize_t (*)(struct istream_private *)) 0x7f7a56b2a490 <i_stream_limit_read>
(gdb) print stream.seek
$2 = (void (*)(struct istream_private *, uoff_t, bool)) 0x7f7a56b24c70 <i_stream_default_seek_seekable>
(gdb) print stream.stat
$3 = (int (*)(struct istream_private *, bool)) 0x7f7a56b2a3f0 <i_stream_limit_stat>
(gdb) print stream.get_size
$4 = (int (*)(struct istream_private *, bool, uoff_t *)) 0x7f7a56b2a6d0 <i_stream_limit_get_size>

(gdb) print *stream
$5 = {iostream = {refcount = 1914672576,
    name = 0x7f7a56a9ab78 <main_arena+88> "\360\326*r2V", error = 0x0,
    close = 0x7f7a56b25050 <i_stream_default_close>,
    destroy = 0x7f7a56b2a690 <i_stream_limit_destroy>,
    set_max_buffer_size = 0x7f7a56b250a0
      <i_stream_default_set_max_buffer_size>,
    destroy_callbacks = {arr = {buffer = 0x0, element_size = 0}, v = 0x0,
      v_modifiable = 0x0}}, read = 0x7f7a56b2a490 <i_stream_limit_read>,
  seek = 0x7f7a56b24c70 <i_stream_default_seek_seekable>, sync = 0x0,
  stat = 0x7f7a56b2a3f0 <i_stream_limit_stat>,
  get_size = 0x7f7a56b2a6d0 <i_stream_limit_get_size>, switch_ioloop = 0x0,
  istream = {v_offset = 0, stream_errno = 0, mmaped = 0, blocking = 1,
    closed = 0, readable_fd = 1, seekable = 1, eof = 0,
    real_stream = 0x5632721f8270}, fd = 15, abs_start_offset = 0,
  statbuf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0,
    st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = -1, st_blksize = 0,
    st_blocks = 0, st_atim = {tv_sec = 1515411215, tv_nsec = 0}, st_mtim = {
      tv_sec = 1515411215, tv_nsec = 0}, st_ctim = {tv_sec = 1515411215,
      tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}, io = 0x0, buffer = 0x0,
  w_buffer = 0x0, buffer_size = 0, max_buffer_size = 8192,
  init_buffer_size = 8192, skip = 0, pos = 0, try_alloc_limit = 0,
  parent = 0x0, parent_start_offset = 0, parent_expected_offset = 0,
  access_counter = 0, line_str = 0x0, line_crlf = 0, return_nolf_line = 0,
  stream_size_passthrough = 0, nonpersistent_buffers = 0}

(gdb) print stream->parent
$6 = (struct istream *) 0x0

The above is wrong, it crashes because stream->parent is NULL. But it should never be NULL when the stream is of istream-limit type (the read virtual method being i_stream_limit_read).

According to the i_stream_create_limit() constructor the parent should've been set to something non-NULL (in i_stream_create()). So — unless it's legal to blank out the parent once it exists — it looks like we're looking at some kind of memory corruption.

A quick round of googling of the Dovecot mailing list turned up nothing useful. The Dovecot 2.3 changelog did show this:

v2.3.0 2017-12-22  Timo Sirainen <tss iki.fi>
...
    - Input streams are more reliable now when there are errors or when
      the maximum buffer size is reached. Previously in some situations
      this could have caused Dovecot to try to read already freed memory.

Perhaps that would fix things, but upgrading the entire Dovecot package was beyond the scope at this point.

A last resort was cleaning up the per-user cache/log dovecot files:

# ls -1 /var/mail/DOMAIN/USER/dovecot*
/var/mail/DOMAIN/USER/dovecot.index
/var/mail/DOMAIN/USER/dovecot.index.cache
/var/mail/DOMAIN/USER/dovecot.index.log
/var/mail/DOMAIN/USER/dovecot.index.log.2
/var/mail/DOMAIN/USER/dovecot-keywords
/var/mail/DOMAIN/USER/dovecot-uidlist
/var/mail/DOMAIN/USER/dovecot-uidvalidity
/var/mail/DOMAIN/USER/dovecot-uidvalidity.5a535834

# rm /var/mail/DOMAIN/USER/dovecot*

And at last, this fixed the issue. (Except not, see below!)

This also changed the IMAP login handshake from:

S: A0002 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
   IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS
   THREAD=ONSELECT
   CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC
   ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE
   SPECIAL-USE] Logged in
...
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded Junk NonJunk $label2 $label1 $ATTACHMENT)
S: * OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded Junk NonJunk $label2 $label1 $ATTACHMENT \*)] Flags permitted

To this:

S: A0002 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
   IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS
   THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT
   CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC
   ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE
   SPECIAL-USE] Logged in
...
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft unknown-0 unknown-2 unknown-1)
S: * OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft unknown-0 unknown-2 unknown-1 \*)] Flags permitted.

That might help someone else debugging this further.

Update 2018-01-26

Turns out things weren't fixed at all. Large directories would still cause dovecot to dump core.

Also, removing all the dovecot* files has the drawback of clearing/hiding IMAP-client set labels (like "Work" and "Important"). Don't do it if you don't need to.

Instead, updating Dovecot to version 2.3.0 did the trick. There are sources, and some prebuilt packages on their website. Or you can build your own Ubuntu/Xenial package.

debug dovecot crash roundcube