Sean’s Obsessions

Sean Walberg’s blog

Spam Spam Spam Spam

I’ve been running SpamAssassin for a while, ever since I wrote this article. Last week, I enabled a couple new features such as Bayesian filtering and auto whitelist.

Auto whitelist was easy enough to set up, I simply added ‘-a’ to the procmail command in my .procmailrc. This FAQ entry explains the feature well, but the summary is that the AWL records the normal score of a certain sender, and if the score of a particular message deviates greatly, it uses an average to bring it closer to the normal. That is, if you normally converse with someone, and they send you a joke perhaps that trips the filters, the AWL might bring the message lower so that it passes.

Bayesian filtering was also pretty easy. I already had my spam-free inbox in /var/spool/mail/sean, and a collection of spam in ~/mail/caughtspam. I then gave my Bayesian filter an initial training session:

1
2
sa-learn --mbox --spam ~/mail/caughtspam
sa-learn --mbox --ham $MAIL

After that, spamassassin will auto-learn on messages that are filtered. It only picks messages with high or low scores, which helps it to re-enforce itself. If a message gets incorrectly categorized, you have to correct the filter by piping the message through sa-learn –spam (or –ham) to tell the filters what the message really was. From pine, it’s pretty easy to pipe the current message through sa-learn.

Since I’ve implemented these two functions, I’ve noticed a reduction in miscategorized spam. Well worth the time to research, set up, and maintain, the Bayesian filtering and auto-whitelist.

Comments

I’m trying something new here. Talk to me on Twitter with the button above, please.