SpamAssassin doesn’t just block spam; it actually learns what spam is from your own mail.

Let’s get SpamAssassin set up on a typical mail server, say one running Postfix. The core idea is to have Postfix hand off incoming mail to SpamAssassin for analysis before delivering it to the user’s inbox.

Here’s the basic Postfix configuration to achieve this. In your main.cf file, you’ll add or modify these lines:

# Enable the content filter
content_filter = spamassassin

# Specify the service definition for spamassassin
spamassassin_destination_recipient_limit = 1

Now, we need to define the spamassassin service in your master.cf file. This tells Postfix how to connect to SpamAssassin.

# SpamAssassin service definition
spamassassin unix -     n       n       -       -       pipe
  user=spamd argv=/usr/bin/spamc -u sysuser -e /usr/sbin/sendmail -oi -f ${sender} -- ${recipient}

The user=spamd part is crucial. SpamAssassin runs as a dedicated user (spamd) for security. The argv line invokes spamc, the client that connects to the SpamAssassin daemon (spamd). The -u sysuser tells spamc to run SpamAssassin rules for the system user sysuser, and -e /usr/sbin/sendmail ... is how spamc hands the mail back to Postfix for delivery after analysis.

The SpamAssassin daemon itself, spamd, needs to be running. You’ll typically start this via your system’s init system (like systemd or init.d). The command to start it might look something like:

systemctl start spamassassin

Or, if you’re using an older system:

service spamassassin start

For it to start automatically on boot:

systemctl enable spamassassin

Now, let’s talk about tuning. SpamAssassin assigns scores to different spam tests. The higher the score, the more likely the email is spam. The default threshold for marking an email as spam is usually 5.0. This is configured in /etc/spamassassin/local.cf.

# Set the score at which an email is considered spam
required_score 5.0

You can adjust this. Lowering it (e.g., to 4.0) makes filtering more aggressive, catching more spam but also increasing the risk of false positives (legitimate emails being marked as spam). Raising it (e.g., to 7.0) makes it less aggressive, reducing false positives but letting more spam through.

SpamAssassin uses a set of rules to identify spam. These rules are updated regularly. To update them manually:

sa-update

This command fetches the latest rule sets from the SpamAssassin network. Running this periodically, perhaps via a cron job, is essential for maintaining effective filtering.

The real power comes from training SpamAssassin. It learns from your mail. You can feed it both spam and non-spam (ham) emails.

To train SpamAssassin with a mailbox full of ham (legitimate email):

sa-learn --ham /var/mail/user/INBOX

To train it with a mailbox full of spam:

sa-learn --spam /var/mail/user/SPAM

These commands update SpamAssassin’s Bayesian classifier. The Bayesian classifier is a statistical model that learns the probability of words or phrases appearing in spam versus ham. The more mail you feed it, the better it gets.

You can also train SpamAssassin directly from your mail client if it supports it. Many clients have "Junk" or "Spam" buttons that, when configured correctly, will move mail to a designated folder and then run sa-learn in the background.

The configuration for which scores trigger which actions is also in local.cf. For instance, you might want to add a header to emails that are likely spam, rather than just assigning a score.

# Add a header indicating spam probability
header __SPAM_AMAZON_DEAL Subject =~ /Amazon Deal/
describe __SPAM_AMAZON_DEAL Amazon Deal in subject

# Set a score for this specific rule
score __SPAM_AMAZON_DEAL 3.0

This example adds 3.0 points if the subject line contains "Amazon Deal." This demonstrates how you can customize rule scoring.

A common pitfall is not having the spamd daemon running or configured correctly. If Postfix tries to send mail to the spamassassin service and spamd isn’t listening, you’ll see errors in your Postfix logs (/var/log/mail.log or similar) like connect to spamassassin[PID]: Connection refused.

Another common issue is the spamc client not being able to connect to the spamd daemon. This could be due to incorrect socket paths or firewall rules if spamd is listening on a network port instead of a Unix socket. The spamc command in the master.cf needs to match how spamd is configured to listen.

The most surprising thing about SpamAssassin’s Bayesian classifier is how sensitive it is to the proportion of spam versus ham it’s trained on. If you train it heavily with only spam and then feed it one ham email, that one ham email might have an outsized impact, potentially skewing its probabilities drastically. This is why it’s important to feed it balanced datasets or to periodically reset and retrain it if you notice a significant shift in its accuracy.

Once SpamAssassin is running and configured, the next thing you’ll likely encounter is dealing with false positives. You’ll need to whitelist senders or specific patterns that are being incorrectly flagged.

Want structured learning?

Take the full Smtp course →