Spamassassin bayes folder/DB permissions for autolearning (and enabling the same when manually moving mails to Junk through a dovecot sieve)
2022-06-11 – 13:14This is another reminder to myself in case I ever have to re-setup the mail system. Fingers crossed that this doesn’t happen any time soon! But if it happens this post will hopefully spare me another few weeks of searching outdated discussion board and mailing list posts until finally reading the proper documentation… :/
I am running a Debian 11 VPS with postfix, dovecot and (daemonized) spamassassin. Only recently had I discovered that spamassassin’s bayes filtering and autolearn functionality was not working properly for multiple reasons:
- By default the bayes database lives in
/root/.spamassassinwhich is … suboptimal. - The bayes database needs to be trained with at least 200 spam+ham mails to be used at all .
- In my setup not only the system user
debian-spamdwould access the bayes DB, but also thedovecot/vmailuser would attempt to access it when regular mail users would manually move a missed spam mail from their inbox to the Junk folder.
Fortunately, each step was easier to solve than I’d like to admit:
- Searching the web reveals that
/root/.spamassassinis a very unfortunate place for the bayes database because most likely the daemonized spamassassin is not able to write there. This appears to be a common problem as I have found this issue quite often during my “research”.
1.1 Move the bayes DB to somewhere else, e.g./var/lib/spamassassin. The spamassassin option you are looking for is “bayes_path” in/etc/spamassassin/local.cf: https://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html
1.2 There already exists a dedicated user to run spamd:debian-spamd, but for some reason it is not used by default. This can be fixed in/etc/default/spamassassinby appending “-u debian-spamd” to the OPTIONS line. After restarting spamassassin it should now run as the correct user. - “Luckily” one of my mail users had a large number of spam mails in his Junk folder which were correctly classified and moved there by the default non-bayesian spamassassin rules, and could be used to train the >200 required spam mails for the bayes filter:
https://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html
The same was done for the ham mails. - This was the tricky part: After moving the
bayes_pathand setting up spamassassin to run as the correct user, the/var/lib/spamassassinfolder had these permissions:4,0K drwxrwxr-x 8 debian-spamd debian-spamd 4,0K Jun 11 12:57 spamassassin
This was good enough to have mails recognized as “ham” by the now running bayes filter to be auto-learned as such. However there was another player in the field that would try to update the bayes database: dovecot/vmail.
I had set up an IMAPSieve for dovecot which is supposed to automatically “sa-learn –spam/ham [mail]” when being moved manually from either Inbox to Junk or vice versa: https://doc.dovecot.org/configuration_manual/howto/antispam_with_sieve/
While spamd itself is allowed write to/var/lib/spamassassinthanks to the updated permissions (1.1), the dovecot sieve could not. Turns out the sieve invoked on a manual move runs as user “vmail”, resulting in error messages in the dovecot log that the sieve execution has failed.
Searching the web again showed many suboptimal solutions, including recursively setting the permissions for /var/lib/spamassassin to 777… That’s not what I want.
Instead:
3.1 Add uservmailto themailgroup (if it isn’t already)
3.2 Change ownership of/var/lib/spamassassintodebian-spamd:mail
3.3 Change ownership of/var/lib/spamassassin/bayes_*todebian-spamd:mail
3.4 Change permissions on/var/lib/spamassassinto 775/770
3.5 Change permissions on/var/lib/spamassassin/bayes_*to 664/660
3.6 Bonus: If you have enabled the spamassassin CRON job in /etc/default/spamassassin:
Change permissions on/var/lib/spamassassin/sa-update-keysto 700
to avoid warnings about unsafe permissions. Your /var/lib/spamassassin folder contents should now look like this:> ls -l /var/lib/
[…]
4,0K drwxrwxr-x 8 debian-spamd mail 4,0K Jun 11 12:57 spamassassin
[…]
> ls -l /var/lib/spamassassin
[…]
356K -rw-rw-r-- 1 debian-spamd mail 360K Jun 11 12:57 bayes_db_seen
7,6M -rw-rw-r-- 1 debian-spamd mail 10M Jun 11 12:57 bayes_db_toks
4,0K drwx------ 3 debian-spamd debian-spamd 4,0K Jun 9 00:33 sa-update-keys
For me this allows spamd itself to learn ham from legitimate mails. It also allows the bayes DB to be updated when manually moving missed spam mails from Inbox to Junk using the dovecot sieve.