Posted on November 1, 2015
Tags: nerd, email, dovecot

So I host my own e-mail on a virtual server. This is partly for fun and partly because I don’t want Google building/selling an advertising profile from the text of my e-mails.

It’s not all good because it costs money to run a virtual server, there is maintenance involved, and the spam filter isn’t as good as Google’s. But it’s how I want, so there.

Perhaps I’m not one of the people who should be running their own server because I’m all lazy about little details like keeping frequent backups.

Nevertheless, I was making a little improvement in preparation for a backup, the first in several months. The important detail was that my mail folders were all recently backed up, but my inbox, with all the good mails from 2015 was not.

Convert inbox and mail folders from mbox format to maildir

Read the instructions on two different pages on the Dovecot wiki. Apparently they provide a reliable and easy-to-use tool for such migrations.

http://wiki2.dovecot.org/Migration/MailFormat http://wiki2.dovecot.org/Tools/Doveadm/Sync

First set mail_location=maildir:~/Maildir in /etc/dovecot/conf.d/10-mail.conf

doveadm backup -u rodney mirror mbox:~/mail:INBOX=/var/mail/rodney

Expected result: after several minutes, my mbox folder tree and inbox would be faithfully replicated as maildirs in ~/Maildir/.

Actual result: after several seconds, all mails were removed from my mbox folder tree and INBOX and replaced with an empty skeleton of folder names, left behind log files, caches and other detitrus.

Panic

Cue the the increased heart rate, flushed cheeks, and suddenly sweaty underarms usually associated with accidental data loss. Check and double check that it’s really gone, not just been moved to some other location. Release a pathetic moan of anguish.

Rescue job

It’s well known that when you delete a file, the data usually stays on disk until overwritten by another file. The trick is finding the file in one piece. Time is of the essence, and disks are getting quite big nowadays.

At this point I remember that there is a surprising lack of good text-mode hex editors for Linux. I find dhex which is satisfactory.

To search the 48G of btrfs for my deleted mails:

dhex -sa 'From: Rodney' /dev/sda

And a little python function to generate the command to copy off 100M around matching locations.

def offsets(hx):
    a = int(hx, 16)
    context = 50 * 1024 * 1024
    skip = a - context
    count = 2 * context
    print("dd if=/dev/sda of=rescue-%s.bin bs=1M iflag=skip_bytes,count_bytes skip=%d count=%d" % (hx, skip, count))

Now run the generated dd commands and rsync the chunks off the server for further analysis.

Emacs is happy to open 100M files if you have the memory, so I search through the chunks and trim off the binary junk surrounding e-mail text. I then open the trimmed file in mutt to check what’s in there.

After several tries finding partial mailboxes, I find one which is almost complete, about three quarters of the way through the disk. It’s just missing a few of the most recent mails. This is enough for me, because it’s really boring searching through a disk for lost e-mails.