====== SpamCheck API 2.0 ======
The SpamCheck API allows plugins to interact in a standardized way with
anti-spam plugins. Thanks to the SpamCheck API Nucleus can request a spam
check without knowing if an anti-spam plugin is installed or which anti-spam
plugin is installed. The same thing applies to plugins, they can also call the SpamCheck API themselves if they want to request a spam check. One of the more prominent examples of a plugin calling the SpamCheck API is the NP_Trackback plugin.
In short, the SpamCheck API allows multiple anti-spam plugins to work together and offers support for many types of spam, such as comment spam, trackback spam
and referer spam.
Version 2.0 of the SpamCheck API adds a number of features, such as marking
false-positives and false-negatives. The specification of version 2 is also
much stricter and now defines the data structures you should use.
====== The SpamCheck event ======
The SpamCheck Api builds on Nucleus' build-in event system. The plugin that
requires a spam check simply creates a **SpamCheck** event and passes as simple
data structure which contains the data that must be checked. After the check
is finished the plugin simply checks the return value embedded in the
data structure.
The anti-spam plugin listens to the **SpamCheck** event and every time such an
event is created, the anti-spam plugin receives the data structure. The plugin
will perform all of its tests based on the contents of the data structure.
The result will be stored inside of the data structure after which the
anti-spam plugin will give control back to the event system.
Example of calling the API:
$spamcheck = array (
/* data structure */
);
$manager->notify('SpamCheck', array ('spamcheck' => & $spamcheck));
if (isset($spamcheck['result']) && $spamcheck['result'] == true)
{
/* this is spam */
}
===== The data structure =====
The data structure is an array. The values of the array can vary between types
of spam. In order to be able to identify between the different types of spam
there is a required value with the key ''type''. This value can contain any of
the following values:
* comment
* referer
* trackback
It is possible to use values other than the ones defined above, but if you do
you must realize that there is no guarantee that a spam check plugin will
be able to use the data structure.
==== Generic values ====
Apart from the type specific values there are also some generic values that
apply to all types of spam.
* live
* return
**live**\\
The ''live'' value is a boolean and can be set to true or false. If this value
is set to true, it means the check is called for an event that is happing in
real time.
If ''live'' is set to true, the anti-spam plugin is allowed to use
the values of the global ''$_SERVER'' and ''$_REQUEST'' arrays to retrieve any extra information. If ''live'' is set to false, the spam check is manually called by a user or administrator. This means that the ''$_SERVER'' and ''$_REQUEST'' arrays do not contain any information about a possible spam attempt, but information about the administrator. If set to false, the anti-spam plugin should never use these global arrays in it's spam detection routines.
**return**\\
The ''return'' value is a boolean and can be set to true or false. If this value
is set to true, it means the anti-spam plugin is not allowed to use any
counter measures against the spam attack. Once it has determined we are dealing
with a spam attack it should return control to the calling plugin.
If ''return'' is set to true the anti-spam plugin is allowed (but not required)
to take any counter measure it wants, such as keeping the HTTP session alive.
If ''return'' is set to true the anti-spam plugin does not need to return control back to the calling plugin.
==== Comments ====
If you want to check a comment for spam you should provide the following values
in the data structure array:
* body
* name
* email
* url
* id
Example of the data structure:
$spamcheck = array (
'type' => 'comment',
'body' => 'This is a test comment',
'author' => 'rakaz',
'url' => 'http://www.rakaz.nl',
'id' => 32,
'live' => true,
'return' => true
);
**body**\\
The ''body'' value is required and must contain a raw and unfiltered copy of the
comment body. This means that any processing that Nucleus does before the
''PostAddComment'' event must be ignored.
//Note: If you rely on the ''PostAddComment'' event you might run into a
problem. You cannot use the ''$data'' parameter of the ''PostAddComment'' event,
because it is already processed by Nucleus. You should use
''requestVar('body')'' instead which should contain the raw data provided by
the comment form.//
**name**\\
The ''name'' value is optional and should contain the name of the user who
posted the comment.
**email**\\
The ''email'' value is optional and should contain the email address of the user
who posted the comment. If the email address contains the mailto: protocol
prefix, it should be removed before calling the SpamCheck API.
**url**\\
The ''url'' value is optional and should contain the URL provided by the user
who posted the comment. The URL should not be relative and should start with
either ''%%http://%%'' or ''%%https://%%''.
//Note: The current version of Nucleus uses one field to provide either the
email address or the url of the user who posted the comment. The plugin
calling the SpamCheck API should determine what this combined field
contains and use it to fill the appropriate value in the data structure.//
**id**\\
The ''id'' value is optional and should contain the id of the article on which
the user has commented.
==== Trackback ====
If you want to check a trackback for spam you should provide the following values in the data structure array:
* title
* excerpt
* blogname
* url
* id
Example of the data structure:
$spamcheck = array (
'type' => 'trackback',
'title' => 'apuesta dinero',
'excerpt' => 'In your free time, check the sites about ruleta',
'blogname' => 'apuesta dinero',
'url' => 'http://www.poker4spain.com/apuesta-dinero.html',
'id' => 19,
'live' => true,
'return' => true
);
**title**\\
The ''title'' value is optional and corresponds directly with the ''title'' value defined in the Trackback specification.
**excerpt**\\
The ''excerpt'' value is required and corresponds directly with the ''excerpt''
value defined in the Trackback specification.
**blogname**\\
The ''blogname'' value is optional and corresponds directly with the ''blog_name'' value defined in the Trackback specification.
**url**\\
The ''url'' value is required and corresponds directly with the ''url'' value
defined in the Trackback specification.
**id**\\
The ''id'' value is optional and should contain the id of the article. This value is also defined in the Trackback specification as ''tb_id''.
//Note: It is possible that the trackback is send using a character encoding that is different from the encoding used by your nucleus installation. If this is the case, the trackback data should be converted to the character encoding of your nucleus installation before checking for trackback spam.//
==== Referer ====
If you want to check a referer for spam you should provide the following values
in the data structure array:
* url
* id
Example of the data structure:
$spamcheck = array (
'type' => 'referer',
'url' => 'http://www.poker4spain.com/apuesta-dinero.html',
'id' => 74,
'live' => true,
'return' => true
);
**url**\\
The ''url'' value is required and should contain the full URL that apparently was used to refer the visitor to your website. This value is usually present in
the ''$_SERVER['HTTP_REFERER']'' global variable. Relative URLs are not allowed
and URLs should always start with ''%%http://%%'' or ''%%https://%%''.
**id**\\
The ''id'' value is optional and should contain the id of the article on which
the visitor entered the website.
==== Backwards compatibility ====
If you want to maintain compatibility with older implementations of the
SpamCheck API you can include one additional ''data'' value. This value contains
all the other type specific fields concatenated to a single string.
For example:
$spamcheck = array (
'type' => 'comment',
'body' => 'This is a test comment',
'author' => 'rakaz',
'url' => 'http://www.rakaz.nl',
'id' => 32,
'live' => true,
'return' => true,
'data' => 'rakaz This is a test comment http://www.rakaz.nl'
);
==== The result ====
**result**\\
If ''return'' is set to true, the anti-spam plugin is expected to give the verdict back to the calling plugin. The anti-spam plugin should add a new value to the data structure called ''result''.
This value is a boolean and can be ''true'' or ''false''. If the value is ''true'' the anti-spam plugin thinks it is spam. If the anti-spam plugin thinks the comment, trackback or referer is normal, it will set ''result'' to ''false''.
//Note: It is possible to install more than one anti-spam plugin. Every
anti-spam plugin will be called in the order that they were installed.
This also means that it is possible the ''result'' is already set by
another anti-plugin.//
//If this is the case you should follow the following rules://
* //If the ''result'' is ''false'', you must run your own test. If your own test indicates that we are dealing with spam you should overwrite the ''result'' with ''true''.//
* //If the ''result'' is ''false'', you must run your own test. If your own test does not indicate that we are dealing with spam you should simply leave ''result'' to ''false''.//
* //If the ''result'' is ''true'', you can run your own test, but if you test indicates that we are not dealing with spam, you should **NEVER** overwrite the existing value of ''result''. Another plugin already marked this as spam and you should honour that.//
* //If the ''result'' is ''true'', you could simply skip any tests of your own because another plugin already marked this as spam.//
In addition to this, if the anti-spam plugin marks this comment, trackback or referer as spam, the anti-spam plugin should also add it's own name and a status message to the data structure. This way other plugins could use this information to show why the comment was marked as spam.
**plugin**\\
The ''plugin'' value contains the name of the plugin. This is usually equal to the the return value of the getName() function of the plugin. This value should only be set when the comment, trackback or referer is marked as spam.
**message**\\
The ''message'' value contains some additional information about the reason why the comment was marked as spam. This message should not be more than 255 characters and may be translated into the language of the Nucleus installation.
Examples of messages are:\\
- Score 66 out of 100\\
- Contains spamvertised URL according to Rbl.bulkfeeds.jp\\
====== The SpamMark event ======
Some anti-spam plugins have the ability to learn and improve their spam
detection. To make this possible we have an event that can tell anti-spam
plugins about their mistakes.
For example, if the anti-spam plugin failed to detect a comment as spam
you can use the SpamMark event to tell the plugin it is spam. The term
for these kinds of mistakes is false-negatives. The other way around is
also possible. If you anti-spam plugin marked a normal comment as spam
you can use the SpamMark event to tell the plugin it is not spam. The
term for these kinds of mistakes are false-positives.
//Note: The SpamMark event is optional for both calling plugins as anti-spam
plugins. However, if you do implement it, you should implement it in the
way described in this specification.//
Example of calling the API:
$spammark = array (
/* data structure */
);
$manager->notify('SpamMark', array ('spammark' => & $spammark));
===== The data structure =====
The data structure is similar to the datastructure of the SpamCheck event.
The differences are explained below.
The generic values ''live'' and ''return'' are not valid in the SpamMark data
structure. Instead, the anti-spam plugin should always return control back to
the plugin initiating the SpamMark event.
There is one new generic value called ''result''. This value should contain a
boolean that indicates what the SpamCheck event should have returned. If this
value is ''true'' the comment, trackback or referer will be marked as being
spam. If this value is ''false'' the comment, trackback or referer will be
marked as a false-positive.