Euthanasia for promising ideas

Spamassassin bayes folder/DB permissions for autolearning (and enabling the same when manually moving mails to Junk through a dovecot sieve)

2022-06-11 – 13:14

This is another reminder to myself in case I ever have to re-setup the mail system. Fingers crossed that this doesn’t happen any time soon! But if it happens this post will hopefully spare me another few weeks of searching outdated discussion board and mailing list posts until finally reading the proper documentation… :/

I am running a Debian 11 VPS with postfix, dovecot and (daemonized) spamassassin. Only recently had I discovered that spamassassin’s bayes filtering and autolearn functionality was not working properly for multiple reasons:

  1. By default the bayes database lives in /root/.spamassassin which is … suboptimal.
  2. The bayes database needs to be trained with at least 200 spam+ham mails to be used at all .
  3. In my setup not only the system user debian-spamd would access the bayes DB, but also the dovecot/vmail user would attempt to access it when regular mail users would manually move a missed spam mail from their inbox to the Junk folder.

Fortunately, each step was easier to solve than I’d like to admit:

  1. Searching the web reveals that /root/.spamassassin is a very unfortunate place for the bayes database because most likely the daemonized spamassassin is not able to write there. This appears to be a common problem as I have found this issue quite often during my “research”.

    1.1 Move the bayes DB to somewhere else, e.g. /var/lib/spamassassin. The spamassassin option you are looking for is “bayes_path” in /etc/spamassassin/

    1.2 There already exists a dedicated user to run spamd: debian-spamd, but for some reason it is not used by default. This can be fixed in /etc/default/spamassassin by appending “-u debian-spamd” to the OPTIONS line. After restarting spamassassin it should now run as the correct user.
  2. “Luckily” one of my mail users had a large number of spam mails in his Junk folder which were correctly classified and moved there by the default non-bayesian spamassassin rules, and could be used to train the >200 required spam mails for the bayes filter:
    The same was done for the ham mails.
  3. This was the tricky part: After moving the bayes_path and setting up spamassassin to run as the correct user, the /var/lib/spamassassin folder had these permissions:
    4,0K drwxrwxr-x 8 debian-spamd debian-spamd 4,0K Jun 11 12:57 spamassassin

    This was good enough to have mails recognized as “ham” by the now running bayes filter to be auto-learned as such. However there was another player in the field that would try to update the bayes database: dovecot/vmail.
    I had set up an IMAPSieve for dovecot which is supposed to automatically “sa-learn –spam/ham [mail]” when being moved manually from either Inbox to Junk or vice versa:
    While spamd itself is allowed write to /var/lib/spamassassin thanks to the updated permissions (1.1), the dovecot sieve could not. Turns out the sieve invoked on a manual move runs as user “vmail”, resulting in error messages in the dovecot log that the sieve execution has failed.
    Searching the web again showed many suboptimal solutions, including recursively setting the permissions for /var/lib/spamassassin to 777… That’s not what I want.
    3.1 Add user vmail to the mail group (if it isn’t already)
    3.2 Change ownership of /var/lib/spamassassin to debian-spamd:mail
    3.3 Change ownership of /var/lib/spamassassin/bayes_* to debian-spamd:mail
    3.4 Change permissions on /var/lib/spamassassin to 775/770
    3.5 Change permissions on /var/lib/spamassassin/bayes_* to 664/660
    3.6 Bonus: If you have enabled the spamassassin CRON job in /etc/default/spamassassin:
    Change permissions on /var/lib/spamassassin/sa-update-keys to 700
    to avoid warnings about unsafe permissions. Your /var/lib/spamassassin folder contents should now look like this:
    > ls -l /var/lib/
    4,0K drwxrwxr-x 8 debian-spamd mail 4,0K Jun 11 12:57 spamassassin
    > ls -l /var/lib/spamassassin
    356K -rw-rw-r-- 1 debian-spamd mail 360K Jun 11 12:57 bayes_db_seen
    7,6M -rw-rw-r-- 1 debian-spamd mail 10M Jun 11 12:57 bayes_db_toks
    4,0K drwx------ 3 debian-spamd debian-spamd 4,0K Jun 9 00:33 sa-update-keys

For me this allows spamd itself to learn ham from legitimate mails. It also allows the bayes DB to be updated when manually moving missed spam mails from Inbox to Junk using the dovecot sieve.

Post a Comment

Insert a smiley:

To prevent spam and comments of people who don't
know about mathematics you have to enter the result
into the field on the right side.
Hint: Possible answers are '-3', '-2', '0', '1' and '6x'.
Some challenges have more than one correct answer!

Please note: Comments will be approved manually!
It may take a few hours to check if yours
is legitimate even after solving the captcha..
Spam protection