| General Plugin info | Add Spam Bayesian filtering to your weblog to catch spam |
|---|---|
| Author: | xiffy |
| Current Version: | 1.1.0 (beta) |
| Download: | SpamBayes version 1.1.0 older: SpamBayes version 1.0.5 SpamBayes version 1.0.4 SpamBayes version 1.0.3 SpamBayes version 1.0.2 SpamBayes version 1.0.1 SpamBayes version 1.0 |
| Code: | not available; please download |
| Demo: | none |
| Forum Thread: | Plugin announced |
This plugin will add Spam bayesian filtering to your weblog. This can be an effective tool to fight Comment and Trackback spam. However you need to understand the basic principles before you should consider using this plugin. Spam Bayesian is not the answer to every spam problem. The first problem you will encounter is the need for a trained filter. And only you can train your filter effectively. That’s why this plugin does not come with any filters pre-installed. I considered this but it’s impossible to determine wheter you would consider some words as spam or not.
So let’s see what Spam bayesian can do for you. Consider that you have trained your filter. The plugin provides some tools to feed the filter with both ‘ham’ and ‘spam’1).
What happens when Spam Bayes receives a message for checking? It determines two probabilities. The first is wheter the message could be considered ham, based on word frequencies. Second it considers wheter the message could be considered spam again based on word frequencies. That’s why you need to train SpamBayes with both ham and spam messages. The word frequencies can only be determined after you taught it enough words to determine the most likely category the message belongs to.
You should be aware by now of the importance of trainig. After installation the plugin menu shows you an option to train all available comments as ham ‘Train HAM (not spam) with all comments’. You can use this option to feed the filter all your available comments as ham. Make sure you don’t have any spam comments on your site otherwise you will feed the filter the wrong messages as ham.
You can undo this action with the option ‘Remove all comments from the HAM (not spam)’.
Once you’ve trained your filter with your comments you can remove the menu item from the list so you won’t accidently click this option again. To do this choose ‘Spam Bayes Options’ from the menu and set ‘Show SpamBayes train all ham in menu?’ to no.
At this moment there are two ways to train your filter with spam messages.
If you accidently choose the wrong category for a training session you can undo this. Choose Spam Bayes untraining from the menu. You’ll see all trained messages (except the trained comments which are invisible in the overview. You can untrain those as explained earlier). When you want a document untrained simply click on the link untrain and the document is deleted from the list and all the word counts are updated to reflect the untraining of the document.
When enabled (defaults to no) all events that are captured by the plugin are logged. This logging is done in a separate database table and contains all the information that the plugin receives. You can view the events from the administration interface. The view is limited to 10 events per page. Each logged event has two handy links to train either spam or ham. That way you can quickly train the filter if a message is considered spam when it isn’t or when a message is considered ham and it isn’t.
Remember that although Spam Bayes might capture spam events, they are not automaticly added to the filter. So even if Spam Bayes did a correct categorisation it is still a good idea to feed the filter some of the correct guesses as well.
There are a couple of options available for this plugin.
My current wishlist for the plugin. Maybe I’ll implement these in a next version, maybe I’ll never get round to this. You can leave a note if you’d like to see one of these (or maybe some unthought of by me) implemented
06 sept 2006 : Initial Release (1.0)
10 sept 2006 : Small update (1.0.1)
15 sept 2006 : Small update (1.0.2)
All this functionality has been added on my own behalf. I know i get a lot of spam (10.000 logged events on a weekly basis) and this way I can quickly scan all logged events to see if any false positives are inside on type or the other. The default log screen became unusable with over 200 events or so.
19 sept 2006 : Small update (1.0.3)
26 sept 2006 : Bugfix (1.0.4)
10 oct 2006 : Small update (1.0.5)
07 jan 2007 : Huge logging overhaul (1.1)
NP_SpamBayes version 1.1.0 works with Nucleus CMS 3.31 - 2007-10-29 admun