spamcheck_api

SpamCheck API 2.0

The SpamCheck API allows plugins to interact in a standardized way with anti-spam plugins. Thanks to the SpamCheck API Nucleus can request a spam check without knowing if an anti-spam plugin is installed or which anti-spam plugin is installed. The same thing applies to plugins, they can also call the SpamCheck API themselves if they want to request a spam check. One of the more prominent examples of a plugin calling the SpamCheck API is the NP_Trackback plugin.

In short, the SpamCheck API allows multiple anti-spam plugins to work together and offers support for many types of spam, such as comment spam, trackback spam and referer spam.

Version 2.0 of the SpamCheck API adds a number of features, such as marking false-positives and false-negatives. The specification of version 2 is also much stricter and now defines the data structures you should use.

The SpamCheck event

The SpamCheck Api builds on Nucleus’ build-in event system. The plugin that requires a spam check simply creates a SpamCheck event and passes as simple data structure which contains the data that must be checked. After the check is finished the plugin simply checks the return value embedded in the data structure.

The anti-spam plugin listens to the SpamCheck event and every time such an event is created, the anti-spam plugin receives the data structure. The plugin will perform all of its tests based on the contents of the data structure. The result will be stored inside of the data structure after which the anti-spam plugin will give control back to the event system.

Example of calling the API:

$spamcheck = array (
    /* data structure */
);
 
$manager->notify('SpamCheck', array ('spamcheck' => & $spamcheck));
    
if (isset($spamcheck['result']) && $spamcheck['result'] == true) 
{
    /* this is spam */
}

The data structure

The data structure is an array. The values of the array can vary between types of spam. In order to be able to identify between the different types of spam there is a required value with the key type. This value can contain any of the following values:

  • comment
  • referer
  • trackback

It is possible to use values other than the ones defined above, but if you do you must realize that there is no guarantee that a spam check plugin will be able to use the data structure.

Generic values

Apart from the type specific values there are also some generic values that apply to all types of spam.

  • live
  • return

live
The live value is a boolean and can be set to true or false. If this value is set to true, it means the check is called for an event that is happing in real time.

If live is set to true, the anti-spam plugin is allowed to use the values of the global $_SERVER and $_REQUEST arrays to retrieve any extra information. If live is set to false, the spam check is manually called by a user or administrator. This means that the $_SERVER and $_REQUEST arrays do not contain any information about a possible spam attempt, but information about the administrator. If set to false, the anti-spam plugin should never use these global arrays in it’s spam detection routines.

return
The return value is a boolean and can be set to true or false. If this value is set to true, it means the anti-spam plugin is not allowed to use any counter measures against the spam attack. Once it has determined we are dealing with a spam attack it should return control to the calling plugin.

If return is set to true the anti-spam plugin is allowed (but not required) to take any counter measure it wants, such as keeping the HTTP session alive. If return is set to true the anti-spam plugin does not need to return control back to the calling plugin.

Comments

If you want to check a comment for spam you should provide the following values in the data structure array:

  • body
  • name
  • email
  • url
  • id

Example of the data structure:

$spamcheck = array (
    'type' => 'comment',
    'body' => 'This is a test comment',
    'author' => 'rakaz',
    'url' => 'http://www.rakaz.nl',
    'id' => 32,
    'live' => true,
    'return' => true
);

body
The body value is required and must contain a raw and unfiltered copy of the comment body. This means that any processing that Nucleus does before the PostAddComment event must be ignored.

Note: If you rely on the PostAddComment event you might run into a problem. You cannot use the $data parameter of the PostAddComment event, because it is already processed by Nucleus. You should use requestVar(’body’) instead which should contain the raw data provided by the comment form.

name
The name value is optional and should contain the name of the user who posted the comment.

email
The email value is optional and should contain the email address of the user who posted the comment. If the email address contains the mailto: protocol prefix, it should be removed before calling the SpamCheck API.

url
The url value is optional and should contain the URL provided by the user who posted the comment. The URL should not be relative and should start with either http:// or https://.

Note: The current version of Nucleus uses one field to provide either the email address or the url of the user who posted the comment. The plugin calling the SpamCheck API should determine what this combined field contains and use it to fill the appropriate value in the data structure.

id
The id value is optional and should contain the id of the article on which the user has commented.

Trackback

If you want to check a trackback for spam you should provide the following values in the data structure array:

  • title
  • excerpt
  • blogname
  • url
  • id

Example of the data structure:

$spamcheck = array (
    'type' => 'trackback',
    'title' => 'apuesta dinero',
    'excerpt' => 'In your free time, check the sites about ruleta',
    'blogname' => 'apuesta dinero',
    'url' => 'http://www.poker4spain.com/apuesta-dinero.html',
    'id' => 19,
    'live' => true,
    'return' => true
);

title
The title value is optional and corresponds directly with the title value defined in the Trackback specification.

excerpt
The excerpt value is required and corresponds directly with the excerpt value defined in the Trackback specification.

blogname
The blogname value is optional and corresponds directly with the blog_name value defined in the Trackback specification.

url
The url value is required and corresponds directly with the url value defined in the Trackback specification.

id
The id value is optional and should contain the id of the article. This value is also defined in the Trackback specification as tb_id.

Note: It is possible that the trackback is send using a character encoding that is different from the encoding used by your nucleus installation. If this is the case, the trackback data should be converted to the character encoding of your nucleus installation before checking for trackback spam.

Referer

If you want to check a referer for spam you should provide the following values in the data structure array:

  • url
  • id

Example of the data structure:

$spamcheck = array (
    'type' => 'referer',
    'url' => 'http://www.poker4spain.com/apuesta-dinero.html',
    'id' => 74,
    'live' => true,
    'return' => true
);

url
The url value is required and should contain the full URL that apparently was used to refer the visitor to your website. This value is usually present in the $_SERVER[’HTTP_REFERER’] global variable. Relative URLs are not allowed and URLs should always start with http:// or https://.

id
The id value is optional and should contain the id of the article on which the visitor entered the website.

Backwards compatibility

If you want to maintain compatibility with older implementations of the SpamCheck API you can include one additional data value. This value contains all the other type specific fields concatenated to a single string.

For example:

$spamcheck = array (
    'type' => 'comment',
    'body' => 'This is a test comment',
    'author' => 'rakaz',
    'url' => 'http://www.rakaz.nl',
    'id' => 32,
    'live' => true,
    'return' => true,
    'data' => 'rakaz This is a test comment http://www.rakaz.nl'
);

The result

result
If return is set to true, the anti-spam plugin is expected to give the verdict back to the calling plugin. The anti-spam plugin should add a new value to the data structure called result.

This value is a boolean and can be true or false. If the value is true the anti-spam plugin thinks it is spam. If the anti-spam plugin thinks the comment, trackback or referer is normal, it will set result to false.

Note: It is possible to install more than one anti-spam plugin. Every anti-spam plugin will be called in the order that they were installed. This also means that it is possible the result is already set by another anti-plugin.

If this is the case you should follow the following rules:

  • If the result is false, you must run your own test. If your own test indicates that we are dealing with spam you should overwrite the result with true.
  • If the result is false, you must run your own test. If your own test does not indicate that we are dealing with spam you should simply leave result to false.
  • If the result is true, you can run your own test, but if you test indicates that we are not dealing with spam, you should NEVER overwrite the existing value of result. Another plugin already marked this as spam and you should honour that.
  • If the result is true, you could simply skip any tests of your own because another plugin already marked this as spam.

In addition to this, if the anti-spam plugin marks this comment, trackback or referer as spam, the anti-spam plugin should also add it’s own name and a status message to the data structure. This way other plugins could use this information to show why the comment was marked as spam.

plugin
The plugin value contains the name of the plugin. This is usually equal to the the return value of the getName() function of the plugin. This value should only be set when the comment, trackback or referer is marked as spam.

message
The message value contains some additional information about the reason why the comment was marked as spam. This message should not be more than 255 characters and may be translated into the language of the Nucleus installation.

Examples of messages are:
- Score 66 out of 100
- Contains spamvertised URL according to Rbl.bulkfeeds.jp

The SpamMark event

Some anti-spam plugins have the ability to learn and improve their spam detection. To make this possible we have an event that can tell anti-spam plugins about their mistakes.

For example, if the anti-spam plugin failed to detect a comment as spam you can use the SpamMark event to tell the plugin it is spam. The term for these kinds of mistakes is false-negatives. The other way around is also possible. If you anti-spam plugin marked a normal comment as spam you can use the SpamMark event to tell the plugin it is not spam. The term for these kinds of mistakes are false-positives.

Note: The SpamMark event is optional for both calling plugins as anti-spam plugins. However, if you do implement it, you should implement it in the way described in this specification.

Example of calling the API:

$spammark = array (
    /* data structure */
);
 
$manager->notify('SpamMark', array ('spammark' => & $spammark));

The data structure

The data structure is similar to the datastructure of the SpamCheck event. The differences are explained below.

The generic values live and return are not valid in the SpamMark data structure. Instead, the anti-spam plugin should always return control back to the plugin initiating the SpamMark event.

There is one new generic value called result. This value should contain a boolean that indicates what the SpamCheck event should have returned. If this value is true the comment, trackback or referer will be marked as being spam. If this value is false the comment, trackback or referer will be marked as a false-positive.

 
spamcheck_api.txt · Last modified: 2006/07/05 13:03