So, hardware trouble caused a VPS to go down. This VPS was running a MySQL server in a slave setup. Not surprisingly, the unclean shutdown broke succesful slaving.
There are several possibly causes for slave setup breakage. This time it
was the local relay log file (
mysqld-relay-bin.xxxx) that was out of
SHOW SLAVE STATUS\G looked like this:
... Master_Log_File: mysql-bin.001814 <-- remote/master file (IO thread) Read_Master_Log_Pos: 33453535 <-- remote/master pos (IO thread) Relay_Log_File: mysqld-relay-bin.001383 <-- local/slave file (SQL thread) Relay_Log_Pos: 34918332 <-- local/slave pos (SQL thread) Relay_Master_Log_File: mysql-bin.001812 <-- remote/master file (SQL thread) ... Last_Errno: 1594 Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. ... Exec_Master_Log_Pos: 34918187 <-- remote/master pos (SQL thread) ...
Step one was to find out where we were in the local and on the remote
end. Luckily, most queries ran during the failure period were
on the same table.
Remote position was ok. On the master,
/var/log/mysql/mysql-bin.001812contained these lines:
# at 34918187 #120125 1:16:05 server id 1 end_log_pos 34918531 ... SET TIMESTAMP=1327450565/*!*/; UPDATE mytable ....
The statements before that had been ran on the slave and this statement hadn't.
- On the slave,
/var/lib/mysql/mysqld-relay-bin.001383did contain the previous line, but did not contain position 34918332.
- Looking further, I could see that
mysqld-relay-bin.001384was practically empty, but
mysqld-relay-bin.001385contained already executed statements, and after a bit of browsing there it was too:
# at 21491 #120125 1:16:05 server id 1 end_log_pos 34918531 ... SET TIMESTAMP=1327450565/*!*/; UPDATE mytable ...
Good. So we need only move the relay log file pointer a bit to the front.
mysql> CHANGE MASTER TO RELAY_LOG_FILE='mysqld-relay-bin.001385', RELAY_LOG_POS=21491; ERROR 1380 (HY000): Failed initializing relay log position: Could not find target log during relay log initialization
What? Searching for that error pointed to a document about copying slave data to another slave and about modifying files. Hmm.. modifying files. I can do that too...
# cat /var/lib/mysql/relay-log.info ./mysqld-relay-bin.001383 34918332 mysql-bin.001812 34918187 0
With a little speed —
/etc/init.d/mysql stop ; vim /var/lib/mysql/relay-log.info ; /etc/init.d/mysql start
— I edited the relay log file and relay log position in
Voilà! It worked. Slave replication was running again like it should.