emacsen's journal - Mutt, Procmail and Bogofilter (a guide to how I do mail)
[Recent Entries][Archive][Friends][User Info]
06:58 am
[Link] |
Mutt, Procmail and Bogofilter (a guide to how I do mail) In the last week, I've switched from Thunderbird and IMAP to Procmail and Mutt.
Mutt has a mixed reputation in the GNU community. It's considered by some to the Vim of mail user agents, and that reputation is not entirely undeserved. Mutt's design is very flexible, much like Emacs, with a focus towards being a flexible platform for mail. One of Mutt's largest strengths is that it support keybindings and macros, though it's macro system is a poor substitute for true elisp ala emacs.
This post is meant as an illustration of what I'm doing, rather than advocacy. I'd love to go back to using Emacs, and maybe someone reading this can suggest ways to work with Emacs again to make it behave more like my current setup.
I do a large amount of mail processing with Procmail. This is the core of my .procmailrc:
DATE=`date +%Y-%m` PATH=/bin:/usr/bin:/usr/local/bin MAILDIR=$HOME/Mail/ # all mailboxes are in .mailspool/ DEFAULT=$HOME/Mail/In.$DATE/ LOGFILE=$HOME/procmail.log SHELL=/bin/sh
#### bogofilter passthrough-update ####
:0fw: bogofilter.lock | bogofilter -p -u -l -e -v # -p)assthrough -u)pdate, -l)og -e)xitcode 0 for spam and ham # -v)erbose
# m-a 2002-10-28 # If bogofilter failed, return the mail to the queue. # Better put this after _EACH_ delivering recipe (not shown here). # Later, the MTA will try again to deliver it. # 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h # # Originally published by Philip Guenther on the postfix-users # mailing list.
:0e { EXITCODE=75 HOST }
#### end error catcher ####
#:0c: :0 * ^X-Bogosity: (Spam|Yes) In.Spam/
# If it's not spam, try to get the email address :0hc | lbdb-fetchaddr
As you can see, I use Maildir. I find Maildir much easier to work with. In order to make archiving simpler, all my mail is in my ~/Mail directory, with lots of maildirs underneath. For each month, I have a Maildir/In folder with the current date. For example, for this month, my inbox is Mail/In.2006-11.
This makes sorting my mail very easy, as well as archiving. All in incoming mail does in my inbox, and my sent mail (which is defined elsewhere), goes into Sent.2006-11. At the end of the month, I can collapse both Maildirs into 2006-11, but that's getting ahead of myself.
The first thing I do with my mail is check it for spam. I've gotten no spam directly to my server, but several other mail services forward mail to me and most of them are far more conservative about what they let in, so I end up with spam, especially from my livejounal account. In addition to the extensive mail server checks, I'm using bogofilter for final analysis. if I were running mail for lots of people, bogofilter probably wouldn't be good enough, but since I'm the only user and my use is fairly "close to the metal", I find bogofilter to be a great tool. In just a week of training I've seen very few false positives and nearly 90% identification of spam.
In the first step, I use bogofilter to add new headers to the spam. The headers include the statistical probably of spamicity as well as a small graph showing the statistical breakdown of tokens in the mail and their spam probability. In the end, bogofilter makes one of three determinations: Spam, Unsure, or Ham. Since receiving all my mail is more important than receiving less spam, I'm willing to deal with false negatives, so in my next line, I move mail to a special In.Spam maildir only if the probability of the message being spam is very high. I've received very few false positives, even without pre-training and been very happy. If I'd started with a corpus of spam and a corups of ham, I could have let bogotune analyze both and pre-generate the wordlists, but even with the current setup, I'm finding it to be very accurate.
Lastly, I use lbdb to store addresses. lbdb is a lesser clone of the popular Big Brother Database from Emacs. While lbdb lacks many of the features of the original, it adds a few others, such as the ability to easily integrate queries into other system such as LDAP right into the address book. By piping all messages through lbdb-fetchaddr, my address book is automatically populated by each and every message I receive.
After this, I do a bit of work to separate out automated messages and things and put them in a separate folder. That's neither unique or interesting.
The mail user agent I'm using currently is Mutt. Mutt is best described as a power user's mua. Its design philosophy seems to be to provide a simple interface with a lot of flexibility, allowing you to create new functions, run shell commands from within the configuation file and remap keys. It's not as powerful as Emacs, but then again, nothing is.
I won't go through my entire .muttrc file, but here are the the interesting bits.
set mbox=~/Mail/In.`date +%Y-%m` set record="+Sent".`date +%Y-%m` set spoolfile=~/Mail/In.`date +%Y-%m`
This sets the current inbox to correspond with the inbox from our Procmail recipe. In addition, we save all sent mail to a Sent.YYYY-MM folder. The only disadvantage of working with mail this way is that on the first of the month, we have to remember to manually check the previous month's folder for mail. Since months change over infrequently (about once a month), I don't consider this to be a problem.
I have several years of mail archived off by date. For the archive, I don't care if mail is mail I recieved or mail I sent, so I keep them in simple folders categorized by date sent/date received, such as 2004-07. Putting my current mail in In.YYYY-MM and Sent.YYYY-MM makes moving the mail to this format easy. At the same time, I like to keep my mailbox free of clutter. I'd like to keep some mails without seeing them. AFAIK Mutt doesn't have a 'hide' function, so I've written my own. When I encounter a mail I want to keep, but not see again, I archive it off to its future destination folder. This has the effect of hiding it without using anything fancy or relying on the MUA to remember its state.
# Archive mail you want to keep but not see macro index A "unset confirmcreate confirmappend=`date +%Y-%m`/set confirmcreate confirmappend" macro pager A "unset confirmcreate confirmappend=`date +%Y-%m`/set confirmcreate confirmappend"
Going back to LBDB, this is how you tell Mutt to use LBDB as an externally queryable mailbox:
# Use LBDB for address book set query_command="lbdbq '%s'"
Mutt still keeps its own aliases, which are a little easier to use, but I've been considering remapping the keys so that Tab is mapped to the external query.
And now I integrate bogofilter into Mutt:
### BOGOFILTER HELP COMMANDS ### macro index s "unset wait_key\nbogofilter -MSn\nset wait_key\n" macro pager s "unset wait_key\nbogofilter -MSn\nset wait_key\n" macro index r "unset wait_key\nbogofilter -Mn\nset wait_key\n" macro pager r "unset wait_key\nbogofilter -Mn\nset wait_key\n" macro index g "unset wait_key\nbogofilter -Mn\nset wait_key\n" macro pager g "unset wait_key\nbogofilter -Mn\nset wait_key\n" macro index l "unset wait_key\nbogofilter -Mn\nset wait_key\n" macro pager l "unset wait_key\nbogofilter -Mn\nset wait_key\n" macro index X "unset wait_key\nbogofilter -MNs\nset wait_key\n"macro pager X "unset wait_key\nbogofilter -MNs\nset wait_key\n"
I took this directly from a Linux Journal article, but what it does is help populate bogofilter's wordlist. If you save or reply to a mail, Mutt will send that mail to Bogofilter to be classified as ham. If you encounter a spam, instead of the normal Delete, this configuration adds a new "X" key which sends the mail to bogofilter to be classified as spam and then deleted.
If spam is caught, it'll be moved into the In.Spam folder by procmail. By saving it to the inbox (or to the archive folder) with "s", I reduce the number of false positives.
Finally, I wrote this script to archive old mails:
#!/bin/sh
MAIL=~/Mail/ MDATE=$1 IN=$MAIL/In.$MDATE SENT=$MAIL/Sent.$MDATE
if [ ! -d $MAIL/$MDATE ]; then mkdir $MAIL/$MDATE mkdir $MAIL/$MDATE/new $MAIL/$MDATE/cur $MAIL/$MDATE/tmp fi
for y in "$IN $SENT"; do for x in "new cur tmp"; do mv -fb $y/$x* $MAIL/$MDATE/$x rmdir $y/$x rmdir $y done done
In the future, I'll write a wrapper script to run this command for "two months ago", but right now it works fine manually.
So, that's it, my mail setup. I'd love to go back to using Emacs, and bbdb, but VM mode doesn't support Maildir, Wanderlust is... odd.. and Gnus is far too complex. Maybe if someone knows a way to tie Gnus down, or to customize RMail to make it behave better, I'd switch back. In the meantime, this seems to work for me.
|
|
| |
![[User Picture]](http://l-userpic.livejournal.com/73031482/60946) | | From: | 808 |
| Date: | November 27th, 2006 06:21 pm (UTC) |
|---|
| | | (Link) |
|
Thanks for this. I use a very similar system and you have given me lots of tips where I can make it easier/more elegant.
| From: | node |
| Date: | January 10th, 2007 02:05 pm (UTC) |
|---|
| | | (Link) |
|
Gnus is complex, but it's the only well-maintained mail client for Emacs. These days, it even supports calling bogofilter internally.
![[User Picture]](http://l-userpic.livejournal.com/11319391/2205946) | | From: | emacsen |
| Date: | January 10th, 2007 02:14 pm (UTC) |
|---|
| | | (Link) |
|
Anything emacs supports calling anything :)
The problem with Gnus was I end up needing to turn all the Gnus features off. I don't want auto-expire or any variant of auto-expire.
I'd love to use it, but the initial learning curve is high, even for someone who used it for over a year (Gnus is not like riding a bicycle).
It's for a similar reason that I use EmacsWiki at work rather than Muse.
Muse looks great, but seems to have no sane defaults, so I'm going to spend a long time configuring it. And the documentation isn't really clear about it.
I have a set of "I love Emacs but..." rants I'm about to post.
Thanks for reading!
| From: | node |
| Date: | January 10th, 2007 03:12 pm (UTC) |
|---|
| | | (Link) |
|
I've never had to turn off auto-expire! |
|