[Home]WikiPatches/SpambotPoison

UseModWiki | WikiPatches | RecentChanges | Preferences

Spambot Poison Patch

Description

This patch adds a bunch of random-generated e-mail addresses to the footer of wiki pages. These addresses are made invisible to regular users, and are there for the sole purpose of poisoning the harvest of spambots. The point is not preventing the spambots from collecting e-mail addresses from the wiki, but making them collect a lot of bogus addresses, making the quality of their harvest nearly useless.

If you wish to learn more about spambots and e-mail harvesting, a good point to start reading is at http://www.neilgunton.com/spambot_trap/

This patch is inspired by Wpoison (http://www.monkeys.com/wpoison/), and in fact is based upon the Wpoison code.

You can see the patch in action at http://apocalypse.rulez.org/kozos or http://apocalypse.rulez.org/maliusz.

Notes

The patch needs a word list, and looks for it in the following locations: $DataDir/words, /usr/dict/words and /usr/share/dict/words. Most linux distributions have packages that include such word lists, for example, debian users can install any package that provides wordlist, e.g. wbritish.

If you use this modification, note that it works best with mod_perl. The reason is that it reads and caches a dictionary, which is slow, but is only done once. The alternate solution is to create a relatively small (~4000 words) word database in the $DataDir.

Installation

Add the following to your script (a good place to add it is before "sub GetHeader"):

# SpambotPoision code starts

use vars qw($SpambotPoison @TlDomains1 @TlDomains2 @RandomWords);

$SpambotPoison  = 1;        # 1 = add spambot poison to the page, 0 = no poison

sub GetRandomLetter() {
  return chr (unpack ("%c", 'a') + int (rand 26));
}

sub GetRandomWord() {
  my ($dictfile, @words, $word, $rand);
  unless (scalar @RandomWords) {
    if (-f "$DataDir/words") {
      $dictfile = "$DataDir/words";
    } elsif ( -f "/usr/dict/words") {
      $dictfile = "/usr/dict/words";
    } elsif (-f "/usr/share/dict/words") {
      $dictfile = "/usr/share/dict/words"
    } else {
      die (T("Couldn't find dictionary file."));
    }
    open (DICTFILE, "$dictfile") or die (T("Couldn't read dictionary file."));
    @words = (<DICTFILE>);
    close DICTFILE;
    for (1..4000) {
      $rand = int (rand $#words);
      $word = lc $words[$rand];
      $word =~ tr/a-z//cd;
      chomp $word;
      push @RandomWords, (lc $word);
    }
  }
  $rand = rand $#RandomWords;
  return $RandomWords[$rand];
}

@TlDomains1 = qw(com com com com net net net org org edu edu gov mil int);
@TlDomains2 = qw(uk su af al dz as ad ao ai aq ag ar am aw au at az bs bh bd
 bb by be bz bj bm bt bo ba bw bv br io bn bg bf bi kh cm ca cv ky cf td cl cn
 cx cc co km cg ck cr ci hr cu cy cz dk dj dm do tp ec eg sv gq er ee et fk fo
 fj fi fr fx gf pf tf ga gm ge de gh gi gr gl gd gp gu gt gn gw gy ht hm hn hk
 hu is in id ir iq ie il it jm jp jo kz ke ki kp kr kw kg la lv lb ls lr ly li
 lt lu mo mk mg mw my mv ml mt mh mq mr mu yt mx fm md mc mn ms ma mz mm na nr
 np nl an nc nz ni ne ng nu nf mp no om pk pw pa pg py pe ph pn pl pt pr qa re
 ro ru rw kn lc vc ws sm st sa sn sc sl sg sk si sb so za gs es lk sh pm sd sr
 sj sz se ch sy tw tj tz th tg tk to tt tn tr tm tc tv ug ua ae gb us um uy uz
 vu va ve vn vg vi wf eh ye yu zr zm zw);

sub GetRandomDomain() {
  my $rindex;

  if (int (rand 4) == 0) {
    $rindex = int (rand ($#TlDomains2 + 1));
    return $TlDomains2[$rindex];
  } else {
    $rindex = int (rand ($#TlDomains1 + 1));
    return $TlDomains1[$rindex];
  }
}

sub GetSpambotPoison() {
  my ($result, $email_addr, $num_addresses);

  $num_addresses = 2 + int (rand 16);
  $result = '<div id=guestbook><h1>Guestbook</h1>';
  for (1..$num_addresses) {
    $email_addr = &GetRandomWord() .  '@';
    if (int (rand 4) == 0) {
      $email_addr .=  &GetRandomWord() .  '.';
    }
    $email_addr .= &GetRandomWord() . &GetRandomLetter() . "." . &GetRandomDomain();
    $result .= "<A HREF=\"mailto:$email_addr\">$email_addr</A><BR>\n";
  }
  $result .= '</div>';
  return $result;
}

# SpambotPoision code ends

Also, in sub GetFooterText, add the following line near the end:

     $result .= T($FooterNote);
   }
   $result .= '</div>';
+  $result .= &GetSpambotPoison() if $SpambotPoison;
   $result .= &GetMinimumFooter();
   return $result;
 }

Also, add the following to your site's StyleSheet:

 #guestbook { display: none; }

This will hide the fake guestbook from your users.


UngarPeter


Some consider it bad behaviour to burn email addresses from domains which one doesn't own or have permission to use. -- MarkusLude
True enough. Still, this method seems unlikely to produce usable addresses. Probably less then one in a thousand. I tied to resolve a few hundred of the generated domains, none of them were in use. This is, of couse, no proof, and in fact a few valid addresses might be generated as well. -- UngarPeter

What about getting a list of known spammer domains and using those as the domains? Also, you could also use "mailsiphon.com" (and maybe provide a link to that site), though I'll bet many spammers have excluded that domain. -- Trent

You could do that too, but the point is to fill the spambot's address list with bad food. Either approach will work. -- UngarPeter

The hiding of the generated email addresses is done via CSS. All browsers not handling CSS still display these addresses. -- MarkusLude

That is correct. The usage of this patch is not recommended if your userbase includes people with first-generation web browsers.

UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited August 12, 2006 1:45 pm by UngarPeter (diff)
Search: