supermicro / ikvm / sslv3 alert bad certificate

supermicro / ikvm / sslv3 alert bad certificate

  • Written by
    Walter Doekes
  • Published on

Today I was asked to look at a machine that disallowed iKVM IPMI console access. It allowed access through the “iKVM/HTML5”, but when connecting using the “Console Redirection” (Java client, see also ipmikvm) it would quit after 10 failed attempts.

TL;DR: The clock of the machine had been reset to a timestamp earlier than the first validity of the supplied client certificate. After changing the BMC time from 2015 to 2021, everything worked fine again.

If you're interested, here are some steps I took to debug the situation:

Debugging

My attempts at logging in started like this:

$ ipmikvm 10.1.2.3
...
Retry =1
Retry =2
Retry =3

Okay, no good. Luckily syslog had some info (some lines elided):

Service [ikvm] accepted (FD=3) from 127.0.0.1:40868
connect_blocking: connected 10.1.2.3:5900
Service [ikvm] connected remote server from 10.0.0.2:38222
Remote socket (FD=9) initialized
SNI: sending servername: 10.1.2.3
...
CERT: Locally installed certificate matched
Certificate accepted: depth=0,
  /C=US/ST=California/L=San Jose/O=Super Micro Computer
  /OU=Software/CN=IPMI/emailAddress=Email
SSL state (connect): SSLv3 read server certificate A
...
SSL state (connect): SSLv3 write client certificate A
...
SSL state (connect): SSLv3 write finished A
SSL state (connect): SSLv3 flush data
SSL alert (read): fatal: bad certificate
SSL_connect: 14094412: error:14094412:
  SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate
Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
Remote socket (FD=9) closed
Local socket (FD=3) closed
Service [ikvm] finished (0 left)

The Java iKVM client uses an stunnel sidecar to take care of the TLS bits, hence the extra connection to 127.0.0.1. Right now, that's not important. What is important is the “SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate” message. Apparently someone disagrees with something in the TLS handshake.

First question: is the client or the server to blame? The log isn't totally clear on that. Let's find out who disconnects.

Rerunning ipmikvm 10.1.2.3 but with sh -x shows us how to invoke the client:

$ sh -x /usr/bin/ipmikvm 10.1.2.3
...
+ dirname /home/walter/.local/lib/ipmikvm/iKVM__V1.69.38.0x0/iKVM__V1.69.38.0x0.jar
+ exec java -Djava.library.path=/home/walter/.local/lib/ipmikvm/iKVM__V1.69.38.0x0 \
    -cp /home/walter/.local/lib/ipmikvm/iKVM__V1.69.38.0x0/iKVM__V1.69.38.0x0.jar \
    tw.com.aten.ikvm.KVMMain 10.1.2.3 RXBoZW1lcmFsVXNlcg== TGlrZUknZFRlbGxZb3U= \
    null 63630 63631 0 0 1 5900 623

We can rerun that one and see what it does, by running it through strace, and redirecting the syscalls to tmp.log. (The output is large enough not to want it on your console, trust me.)

$ strace -s 8192 -ttf \
    java -Djava.library.path=/home/walter/.local/lib/ipmikvm/iKVM__V1.69.38.0x0 \
    -cp /home/walter/.local/lib/ipmikvm/iKVM__V1.69.38.0x0/iKVM__V1.69.38.0x0.jar \
    tw.com.aten.ikvm.KVMMain 10.1.2.3 RXBoZW1lcmFsVXNlcg== TGlrZUknZFRlbGxZb3U= \
    null 63630 63631 0 0 1 5900 623 \
    2> tmp.log
connect failed sd:18
Retry =1
^C

We expect a tcp connect() to 10.1.2.3:

[pid 130553] 08:50:30.418890 connect(
  9, {sa_family=AF_INET, sin_port=htons(5900),
  sin_addr=inet_addr("10.1.2.3")}, 16)
    = -1 EINPROGRESS (Operation now in progress)

Not quite connected yet, but it's apparently non-blocking. Scrolling downward in pid 130553 we see:

[pid 130553] 08:50:30.419037 poll(
  [{fd=9, events=POLLIN|POLLOUT}], 1, 10000)
    = 1 ([{fd=9, revents=POLLOUT}])

Good, it's connected. Now following the read/write (usually recv/send or a similar syscall, but not this time) on fd 9 shows us:

[pid 130553] 08:50:30.684609 read(
  9, "\25\3\3\0\2", 5) = 5
[pid 130553] 08:50:30.684664 read(
  9, "\2*", 2) = 2
[pid 130553] 08:50:30.684775 sendto(
  6, "<31>Jun 17 08:50:30 stunnel: LOG7[130546:139728830977792]:
      SSL alert (read): fatal: bad certificate",
  99, MSG_NOSIGNAL, NULL, 0) = 99
...
[pid 130553] 08:50:30.685333 close(9)   = 0

So, the client is the one closing the connection after receiving "\2" (2) and "*" (0x2A, 42 decimal).

OpenSSL errors

We can go back in the strace output and look for certificates used:

[pid 130519] 08:50:29.203145 openat(
  AT_FDCWD, "/tmp/1623912629200/client.crt",
  O_WRONLY|O_CREAT|O_TRUNC, 0666) = 18
[pid 130519] 08:50:29.203497 write(
  18, "-----BEGIN CERTIFICATE-----\nMIID7TCCAt...) = 1424

The sidecar setup writes a client.crt, client.key and server.crt.

If we extract their contents from the strace output and write them to a file, we can use the openssl s_client to connect directly and get additional debug info:

$ openssl s_client -connect 10.1.2.3:5900 -servername 10.1.2.3 \
    -showcerts -debug
...
read from 0x55bbada227a0 [0x55bbada2a708]
  (2 bytes => 2 (0x2))
0000 - 02 28    .(
140555689141568:error:14094410:SSL routines:ssl3_read_bytes:
  sslv3 alert handshake failure:../ssl/record/rec_layer_s3.c:1543:
  SSL alert number 40

So, not supplying a client certificate gets us an error 40 (0x28), followed by a disconnect from the server (read returns -1). This is fine. Error 40 (handshake_failure) means that one or more security parameters were bad. In this case because we didn't supply the client certificate.

What happens if we send a self-generated client certificate?

$ openssl s_client -connect 10.1.2.3:5900 -servername 10.1.2.3 \
    -showcerts -debug -cert CUSTOMCERT.crt -key CUSTOMCERT.key
...
read from 0x5604603d7750 [0x5604603dcd68]
  (2 bytes => 2 (0x2))
0000 - 02 30    .0
139773856281920:error:14094418:SSL routines:ssl3_read_bytes:
  tlsv1 alert unknown ca:../ssl/record/rec_layer_s3.c:1543:
  SSL alert number 48

Error 48 (unknown_ca). That makes sense, as the server does not know the CA of our custom generated certificate.

Lastly with the correct certificate, we get an error 42 (0x2A):

$ openssl s_client -connect 10.1.2.3:5900 -servername 10.1.2.3 \
    -showcerts -debug -cert client.crt -key client.key
...
read from 0x556b27ca7cd0 [0x556b27cad2e8]
  (2 bytes => 2 (0x2))
0000 - 02 2a    .*
140701791647040:error:14094412:SSL routines:ssl3_read_bytes:
  sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1543:
  SSL alert number 42

Error 42 is bad_certificate, with this description from RFC 5246 (7.2.2):

   bad_certificate
      A certificate was corrupt, contained signatures that did not
      verify correctly, etc.

We're now quite certain it's our client certificate that is being rejected. But we're no closer to the reason why. If we openssl-verify client.crt locally, it appears to be just fine.

Upgrading and inspecting the BMC firmware

This particular motherboard — X10SRD-F — already had the latest Firmware according to the SuperMicro BIOS IPMI downloads: REDFISH_X10_388_20200221_unsigned.zip

As a last ditch effort, we checked if we could upgrade to a newer version. After all, in the changelog for version 3.90 (and similar for 3.89) it said:

7. Corrected issues with KVM console.

Ignoring the fact that version 3.89 was not listed for our hardware, we went ahead and upgraded to 3.89. That went smoothly, but the problem persisted.

Upgrade to 3.90 then? Or maybe there is something else we're overlooking. Let's see if we can disect the firmware:

$ unzip ../REDFISH_X10_390_20200717_unsigned.zip
Archive:  ../REDFISH_X10_390_20200717_unsigned.zip
  inflating: REDFISH_X10_390_20200717_unsigned.bin
...
$ binwalk REDFISH_X10_390_20200717_unsigned.bin

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
103360        0x193C0         CRC32 polynomial table, little endian
4194304       0x400000        CramFS filesystem, little endian, size: 15253504, version 2, sorted_dirs, CRC 0x4148A5CC, edition 0, 8631 blocks, 1100 files
20971520      0x1400000       uImage header, header size: 64 bytes, header CRC: 0xC3B2AF42, created: 2020-07-17 09:02:52, image size: 1537698 bytes, Data Address: 0x40008000, Entry Point: 0x40008000, data CRC: 0x4ACB7660, OS: Linux, CPU: ARM, image type: OS Kernel Image, compression type: gzip, image name: "21400000"
20971584      0x1400040       gzip compressed data, maximum compression, has original file name: "linux.bin", from Unix, last modified: 2020-07-17 08:56:49
24117248      0x1700000       CramFS filesystem, little endian, size: 7446528, version 2, sorted_dirs, CRC 0x1D3A953F, edition 0, 3089 blocks, 456 files

4194304 and 24117248 are both multiples of 4096 (0x1000) (obvious from the zeroes in the hexadecimal column), this speeds up this dd step a bit:

$ dd if=REDFISH_X10_390_20200717_unsigned.bin \
    bs=$(( 0x1000 )) skip=$(( 0x400000 / 0x1000 )) \
    count=$(( (0x1400000 - 0x400000) / 0x1000 )) \
    of=cramfs1.bin
$ dd if=REDFISH_X10_390_20200717_unsigned.bin \
    bs=$(( 0x1000 )) skip=$(( 0x1700000 / 0x1000 )) \
    of=cramfs2.bin
$ du -b cramfs*
16777216  cramfs1.bin
9437184   cramfs2.bin

We can mount these and inspect their contents:

$ sudo mkdir /mnt/cramfs{1,2}
$ sudo mount -t cramfs ./cramfs1.bin /mnt/cramfs1
$ sudo mount -t cramfs ./cramfs2.bin /mnt/cramfs2

cramfs1.bin contains a Linux filesystem with an stunnel configuration:

$ ls /mnt/cramfs1/
bin  linuxrc     proc   SMASH  var
dev  lost+found  run    sys    web
etc  mnt         sbin   tmp
lib  nv          share  usr
$ ls /mnt/cramfs1/etc/stunnel/
client.crt  server.key
server.crt  stunnel.conf

This also looks sane. The server.key matches the server.crt we already had. And the client.crt also matches what we had. And any and all validation on these just succeeds.

cramfs2.bin then?

$ ls /mnt/cramfs2/
cgi
cgi-bin
css
extract.in
iKVM__V1.69.38.0x0.jar.pack.gz
...

This looks like the webserver contents on https://10.1.2.3. (Interestingly the iKVM__V1.69.38.0x0.jar.pack.gz file differs between 3.88 and 3.89 and 3.90, but that turned out to be of no significance.)

Peering into the jar yielded no additional clues unfortunately:

$ unpack200 /mnt/cramfs2/iKVM__V1.69.38.0x0.jar.pack.gz iKVM__V1.69.38.0x0.jar
$ unzip iKVM__V1.69.38.0x0.jar
Archive:  iKVM__V1.69.38.0x0.jar
PACK200
  inflating: META-INF/MANIFEST.MF
...
$ ls res/*.crt res/*.key
res/client.crt
res/client.key
res/server.crt

Same certificates. Everything matches.

Date and Time

At this point I'm giving up. I had tried the Syslog option in the BMC, which gave me 0 output thusfar. I had tried replacing the webserver certificate. Upgrading the BMC...

Out of ideas I'm mindlessly clicking around in the web interface. This landed me at Configuration -> Date and Time. Apparently the local date was set to somewhere in the year 2015.

We might as well fix that and try one last time.

Yes! After fixing the date, connecting suddenly worked.

Immediately all pieces fit together:

$ openssl x509 -in client.crt -noout -text |
    grep Validity -A3

        Validity
            Not Before: May 19 09:46:36 2016 GMT
            Not After : May 17 09:46:36 2026 GMT
        Subject: C = US, ST = California, L = San Jose,
          O = Super Micro Computer, OU = Software,
          CN = IPMI, emailAddress = Email

Crap. The server had been looking at a “not yet valid” certificate the whole time. The certificate would be valid between 2016 and 2026, but the server was still living in the year 2015.

I wonder why stunnel/openssl did not send error 45 (certificate_expired). After all, the RFC says “a certificate has expired or is not currently valid” (my emphasis). That would've pointed us to the cause immediately.

This problem was one giant time sink. But we did learn a few things about structure of the BMC Firmware. And, also important: after the 17th of May in the year 2026, the iKVM connections will stop working unless we upgrade the firmware or fiddle with the time.

Maybe set a reminder for that event, in case these servers are still around by then...


Back to overview Newer post: openssl / error 42 / certificate not yet valid Older post: partially removed pve node / proxmox cluster