[Home]WikiPatches/UseBogoSpamNotify

UseModWiki | WikiPatches | RecentChanges | Preferences

Following is a patch (diff -u) that uses WikiSpam/BogoFilter to mark pages as spam or not. For admins this patch adds a spam tool bar to the bottom of every content page. It looks like:

 Spam Administration: Mark as spam | Mark as ham

Clicking on "Mark as Spam" sends the text of the page (of specific versions if you are looking at historical data) to bogofilter to update it's spam index. Marking the page as "Ham' tells bogofilter that this page's content is not spam.

On the RecentChanges page, pages are marked with their spam rating, thus indicating that someone should remove the spam.

To enable the patch, set $UseBogo? to 1. Also, you will need to install bogofilter and it's dependancies and set the path to bogofilter in $BogoFilter?.

As an example, http://sosaith.org/cgi-bin/wiki-bogo.pl?HomePage has been set up for testing. The admin password is "test". Please remove real spam from there if you see some as I don't have it patched to keep the bots out of the history. --TomScanlan


--- wiki.pl     (revision 16)
+++ wiki.pl     (working copy)
@@ -53,7 +53,8 @@
   @IsbnNames @IsbnPre @IsbnPost $EmailFile $FavIcon $RssDays $UserHeader
   $UserBody $StartUID $ParseParas $AuthorFooter $UseUpload $AllUpload
   $UploadDir $UploadUrl $LimitFileUrl $MaintTrimRc $SearchButton
-  $EditNameLink $UseMetaWiki @ImageSites $BracketImg );
+  $EditNameLink $UseMetaWiki @ImageSites $BracketImg
+  $BogoFilter $BogoFilterDir $UseBogo );
 # Note: $NotifyDefault is kept because it was a config variable in 0.90
 # Other global variables:
 use vars qw(%Page %Section %Text %InterSite %SaveUrl %SaveNumUrl
@@ -89,6 +90,7 @@
 $NotFoundPg  = "";              # Page for not-found links ("" for blank pg)
 $EmailFrom   = "Wiki";          # Text for "From: " field of email notes.
 $SendMail    = "/usr/sbin/sendmail";  # Full path to sendmail executable
+$BogoFilter  = "/usr/local/bin/bogofilter";  # Full path to bogofilter executable
 $FooterNote  = "";              # HTML for bottom of every page
 $EditNote    = "";              # HTML notice above buttons on edit page
 $MaxPost     = 1024 * 210;      # Maximum 210K posts (about 200K for pages)
@@ -132,6 +134,7 @@
 $TableSyntax = 1;           # 1 = wiki syntax tables, 0 = no table syntax
 $NewFS       = 0;           # 1 = new multibyte $FS,  0 = old $FS
 $UseUpload   = 0;           # 1 = allow uploads,      0 = no uploads
+$UseBogo        = 0;           # 1 = spam control via bogofilter, 0 = no bogofilter

 # Minor options:
 $LogoLeft     = 0;      # 1 = logo on left,       0 = logo on right
@@ -208,6 +211,7 @@
 $RcOldFile   = "$DataDir/oldrclog"; # Old RecentChanges logfile
 $IndexFile   = "$DataDir/pageidx";  # List of all pages
 $EmailFile   = "$DataDir/emails";   # Email notification lists
+$BogoFilterDir = "$DataDir/bogofilter";   # Stores data for use by bogofilter

 if ($RepInterMap) {
   push @ReplaceableFiles, $InterFile;
@@ -865,6 +869,9 @@
   $link .= &GetPageLink($pagename);
   $html .= "<li>$link ";
   $html .=  &CalcTime($timestamp) . " $count$edit" . " $sum";
+  if ($UseBogo) {
+         $html .= &GetSpamStatus($pagename, $revision);
+  }
   $html .= ". . . . . $author\n";
   return $html;
 }
@@ -1140,6 +1147,27 @@
   return &ScriptLink("action=delete&id=$id&confirm=$confirm", $name);
 }

+sub GetMarkAsSpamLink {
+  my ($id, $revision, $name) = @_;
+
+  if ($FreeLinks) {
+    $id = &FreeToNormal($id);
+    $name =~ s/_/ /g;
+  }
+  return &ScriptLink(&GetOldPageParameters("bogomarkasspam", $id, $revision), $name) if ($revision);
+  return &ScriptLink("action=bogomarkasspam&id=$id", $name)
+}
+sub GetMarkAsHamLink {
+  my ($id, $revision, $name) = @_;
+
+  if ($FreeLinks) {
+    $id = &FreeToNormal($id);
+    $name =~ s/_/ /g;
+  }
+  return &ScriptLink(&GetOldPageParameters("bogomarkasham", $id, $revision), $name) if ($revision);
+  return &ScriptLink("action=bogomarkasham&id=$id", $name)
+}
+
 sub GetOldPageParameters {
   my ($kind, $id, $revision) = @_;

@@ -3128,6 +3156,10 @@
       &DoConvert();
     } elsif ($action eq "trimusers") {
       &DoTrimUsers();
+       } elsif ($action eq "bogomarkasham") {
+         &DoBogoMarkAsHam();
+       } elsif ($action eq "bogomarkasspam") {
+         &DoBogoMarkAsSpam();
     } else {
       &ReportError(Ts('Invalid action parameter %s', $action));
     }
@@ -4053,6 +4085,88 @@
   close(SENDMAIL) or warn "sendmail didn't close nicely";
 }

+sub DoBogoMarkAsSpam {
+       print &GetHeader('', T('Marking page as spam'), '');
+
+       return  if (!&UserIsAdminOrError());
+       my $id = &GetParam("id", "");
+
+       if ($id eq "") {
+               print '<p>', T('Missing page id to mark as spam...');
+               return;
+       }
+       return  if (!&ValidIdOrDie($id));       # Consider nicer error?
+
+       &OpenPage($id);
+       &OpenDefaultText();
+
+       # Old revision handling
+       my $revision = &GetParam('revision', '');
+       $revision =~ s/\D//g;  # Remove non-numeric chars
+
+       if ($revision ne '') {
+               &OpenKeptRevisions('text_default');
+               if (!defined($KeptRevisions{$revision})) {
+                       print '<p>', T('Missing revision $revision for page id $id...');
+                       $revision = '';
+                       return;
+               } else {
+                       &OpenKeptRevision($revision);
+               }
+       }
+       my $pageTime = $Section{'ts'};
+       my $text = $Text{'text'};
+
+       open (BOGO, "| $BogoFilter -d $BogoFilterDir -s")
+               or die "Can't mark $id as spam: $!\n";
+       print BOGO $text;
+       close(BOGO) or warn "$BogoFilter didn't close nicely";
+
+       print "Done.";
+       print &GetCommonFooter();
+}
+
+sub DoBogoMarkAsHam {
+       print &GetHeader('', T('Marking page as ham'), '');
+
+       return  if (!&UserIsAdminOrError());
+       my $id = &GetParam("id", "");
+
+       if ($id eq "") {
+               print '<p>', T('Missing page id to mark as ham...');
+               return;
+       }
+       return  if (!&ValidIdOrDie($id));       # Consider nicer error?
+
+       &OpenPage($id);
+       &OpenDefaultText();
+
+       # Old revision handling
+       my $revision = &GetParam('revision', '');
+       $revision =~ s/\D//g;  # Remove non-numeric chars
+
+       if ($revision ne '') {
+               &OpenKeptRevisions('text_default');
+               if (!defined($KeptRevisions{$revision})) {
+                       print '<p>', T('Missing revision $revision for page id $id...');
+                       $revision = '';
+                       return;
+               } else {
+                       &OpenKeptRevision($revision);
+               }
+       }
+       my $pageTime = $Section{'ts'};
+       my $text = $Text{'text'};
+
+       open (BOGO, "| $BogoFilter -d $BogoFilterDir -n")
+               or die "Can't mark $id as ham: $!\n";
+       print BOGO $text;
+       close(BOGO) or warn "$BogoFilter didn't close nicely";
+
+       print "Done.";
+       print &GetCommonFooter();
+}
+
 ## Email folks who want to know a note that a page has been modified. - JimM.
 sub EmailNotify {
   local $/ = "\n";   # don't slurp whole files in this sub.
@@ -4425,6 +4539,16 @@
   print &GetMinimumFooter();
 }

+sub AppendToBanned {
+  my $addr = shift;
+  my $fname = "$DataDir/banlist";
+
+
+  if ($addr ne "") {
+    &AppendStringToFile($fname, $addr);
+  }
+}
+
 sub DoUpdateBanned {
   my ($newList, $fname);

@@ -4862,6 +4986,17 @@
   } else {
     $result .= " | " . &ScriptLink("action=editlock&set=1", T("Lock site"));
   }
+
+       if ($UseBogo) {
+               my $revision = &GetParam('revision', '');
+               $revision =~ s/\D//g;           # Remove non-numeric chars
+
+               $result .= '<BR>';
+               $result .= T('Spam Administration') . ': ';
+               $result .= &GetMarkAsSpamLink($id, $revision, "Mark as spam");
+               $result .= " | " . &GetMarkAsHamLink($id, $revision, "Mark as ham");
+       }
+
   return $result;
 }

@@ -5089,6 +5224,48 @@
   print Ts('Recommended $StartUID setting is %s.', $maxID + 100) . '<br>';
   print &GetCommonFooter();
 }
+
+sub GetSpamStatus {
+       my $id = shift;
+       my $revision = shift;
+
+       return  if (!&ValidIdOrDie($id));       # Consider nicer error?
+
+       &OpenPage($id);
+       &OpenDefaultText();
+
+       if ($revision ne '') {
+#              print "<p>rev: " . $Page{'revision'} . ",$revision";
+               if ($Page{'revision'} != $revision) {
+
+                       &OpenKeptRevisions('text_default');
+                       if (!defined($KeptRevisions{$revision})) {
+                               print "<p>Missing revision $revision for page id $id...";
+                               $revision = '';
+                               return;
+                       } else {
+                               &OpenKeptRevision($revision);
+                       }
+               }
+       }
+
+       my $pageTime = $Section{'ts'};
+       my $text = $Text{'text'};
+
+       $text =~ s![;'"`]! !gs;
+
+#      return qq(my status = `echo '$text' | $BogoFilter -vvv -p -d $BogoFilterDir"`);
+       my $status = `echo '$text' | $BogoFilter -vvv -p -d $BogoFilterDir`;
+       my $rc = ($? >> 8);
+#print " -- $status -- $rc --\n\n";
+#print "revision: $rev<br>\n";
+       return "Spamish" if $rc == 0;
+       return "Hamish" if $rc == 1;
+       return "Might Be Spam" if $rc == 2;
+       return "No Spam Detect" if $rc == 3;
+}
+
+
 #END_OF_OTHER_CODE

 &DoWikiRequest()  if ($RunCGI && ($_ ne 'nocgi'));   # Do everything.

(Added in April and August 2008 by LaurentDaverio) :

If you trust your spam filter enough to enforce editing control, you can do so by inserting the following lines at the beginning of the DoPost? subroutine (around line 3950 - be careful, you line number might not match, as I'm using a patched UseModWiki). :

--- wiki (original version)
+++ wiki (modified version with blocking filter added)
@@ -3950,6 +3950,18 @@
   my $editTime = $Now;
   my $authorAddr = $ENV{REMOTE_ADDR};
 
+  # Check edit using Bogofilter
+  my $text = $string;
+  $text =~ s![;'"`]! !gs;
+  my $status = `echo '$text' | $BogoFilter -vvv -p -d $BogoFilterDir`;
+  my $rc = ($? >> 8);
+  if ($rc == 0) {
+      # if 'Spamish', deny editing
+      print &GetHeader("", T('Editing Denied'), "");
+      print T("Spam content detected, editing is denied.");
+      print &GetCommonFooter();
+
+      # Log blocked attempt, just in case
+      if ($BogoBlockedLog =~ /^\|/) {
=          open BLOCKED_LOG, $BogoBlockedLog; # Log to a process
+      } else {
+          open BLOCKED_LOG, ">>$BogoBlockedLog"; # Log to a file
+      }
+      print BLOCKED_LOG "-----\n";
+      print BLOCKED_LOG $ENV{'REMOTE_ADDR'}. " [" . localtime()." CET] $id\n";
+      print BLOCKED_LOG "$string\n";
+      close BLOCKED_LOG;
+
+      return;
+  }
   if (!&UserCanEdit($id, 1)) {
     # This is an internal interface--we don't need to explain
     &ReportError(Ts('Editing not allowed for %s.', $id));

Also, you will have to define a new variable, $BogoBlockedLog?, at the beginning of your script (e.g just after $UseBogo?). Blocked submissions can be written to a log file, of piped through an external command such as Cronolog, e.g. :

$BogoBlockedLog  = '|/usr/bin/cronolog -l .../blocked_log .../logs/blocked_log.%Y%m%d'; # Blocked edits log


UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited October 12, 2007 4:34 pm by MarkusLude (diff)
Search: