So, hardware trouble caused a VPS to go down. This VPS was running a MySQL server in a slave setup. Not surprisingly, the unclean shutdown broke succesful slaving.

There are several possibly causes for slave setup breakage. This time it was the local relay log file (mysqld-relay-bin.xxxx) that was out of sync.

SHOW SLAVE STATUS\G looked like this:

...
       Master_Log_File: mysql-bin.001814  <-- remote/master file (IO thread)
   Read_Master_Log_Pos: 33453535          <-- remote/master pos  (IO thread)
        Relay_Log_File: mysqld-relay-bin.001383  <-- local/slave file (SQL thread)
         Relay_Log_Pos: 34918332                 <-- local/slave pos  (SQL thread)
 Relay_Master_Log_File: mysql-bin.001812  <-- remote/master file (SQL thread)
...
            Last_Errno: 1594
            Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
...
   Exec_Master_Log_Pos: 34918187          <-- remote/master pos  (SQL thread)
...

Step one was to find out where we were in the local and on the remote end. Luckily, most queries ran during the failure period were UPDATEs on the same table.

  • Remote position was ok. On the master, /var/log/mysql/mysql-bin.001812 contained these lines:

    # at 34918187
    #120125  1:16:05 server id 1  end_log_pos 34918531 ...
    SET TIMESTAMP=1327450565/*!*/;
    UPDATE mytable ....
    
  • The statements before that had been ran on the slave and this statement hadn't.

  • On the slave, /var/lib/mysql/mysqld-relay-bin.001383 did contain the previous line, but did not contain position 34918332.
  • Looking further, I could see that mysqld-relay-bin.001384 was practically empty, but mysqld-relay-bin.001385 contained already executed statements, and after a bit of browsing there it was too:
    # at 21491
    #120125  1:16:05 server id 1  end_log_pos 34918531 ...
    SET TIMESTAMP=1327450565/*!*/;
    UPDATE mytable ...
    

Good. So we need only move the relay log file pointer a bit to the front.

mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysqld-relay-bin.001385', RELAY_LOG_POS=21491;
ERROR 1380 (HY000): Failed initializing relay log position: Could not find target log during relay log initialization

What? Searching for that error pointed to a document about copying slave data to another slave and about modifying files. Hmm.. modifying files. I can do that too...

# cat /var/lib/mysql/relay-log.info
./mysqld-relay-bin.001383
34918332
mysql-bin.001812
34918187
0

With a little speed — /etc/init.d/mysql stop ; vim /var/lib/mysql/relay-log.info ; /etc/init.d/mysql start — I edited the relay log file and relay log position in relay-log.info by hand.

Voilà! It worked. Slave replication was running again like it should.

shell error slave mysql