This patch adds a bunch of random-generated e-mail addresses to the footer of wiki pages. These addresses are made invisible to regular users, and are there for the sole purpose of poisoning the harvest of spambots. The point is not preventing the spambots from collecting e-mail addresses from the wiki, but making them collect a lot of bogus addresses, making the quality of their harvest nearly useless.
If you wish to learn more about spambots and e-mail harvesting, a good point to start reading is at http://www.neilgunton.com/spambot_trap/
This patch is inspired by Wpoison (http://www.monkeys.com/wpoison/), and in fact is based upon the Wpoison code.
You can see the patch in action at http://apocalypse.rulez.org/kozos or http://apocalypse.rulez.org/maliusz.
The patch needs a word list, and looks for it in the following locations: $DataDir/words, /usr/dict/words and /usr/share/dict/words. Most linux distributions have packages that include such word lists, for example, debian users can install any package that provides wordlist, e.g. wbritish.
If you use this modification, note that it works best with mod_perl. The reason is that it reads and caches a dictionary, which is slow, but is only done once. The alternate solution is to create a relatively small (~4000 words) word database in the $DataDir.
Add the following to your script (a good place to add it is before "sub GetHeader"):
# SpambotPoision code starts use vars qw($SpambotPoison @TlDomains1 @TlDomains2 @RandomWords); $SpambotPoison = 1; # 1 = add spambot poison to the page, 0 = no poison sub GetRandomLetter() { return chr (unpack ("%c", 'a') + int (rand 26)); } sub GetRandomWord() { my ($dictfile, @words, $word, $rand); unless (scalar @RandomWords) { if (-f "$DataDir/words") { $dictfile = "$DataDir/words"; } elsif ( -f "/usr/dict/words") { $dictfile = "/usr/dict/words"; } elsif (-f "/usr/share/dict/words") { $dictfile = "/usr/share/dict/words" } else { die (T("Couldn't find dictionary file.")); } open (DICTFILE, "$dictfile") or die (T("Couldn't read dictionary file.")); @words = (<DICTFILE>); close DICTFILE; for (1..4000) { $rand = int (rand $#words); $word = lc $words[$rand]; $word =~ tr/a-z//cd; chomp $word; push @RandomWords, (lc $word); } } $rand = rand $#RandomWords; return $RandomWords[$rand]; } @TlDomains1 = qw(com com com com net net net org org edu edu gov mil int); @TlDomains2 = qw(uk su af al dz as ad ao ai aq ag ar am aw au at az bs bh bd bb by be bz bj bm bt bo ba bw bv br io bn bg bf bi kh cm ca cv ky cf td cl cn cx cc co km cg ck cr ci hr cu cy cz dk dj dm do tp ec eg sv gq er ee et fk fo fj fi fr fx gf pf tf ga gm ge de gh gi gr gl gd gp gu gt gn gw gy ht hm hn hk hu is in id ir iq ie il it jm jp jo kz ke ki kp kr kw kg la lv lb ls lr ly li lt lu mo mk mg mw my mv ml mt mh mq mr mu yt mx fm md mc mn ms ma mz mm na nr np nl an nc nz ni ne ng nu nf mp no om pk pw pa pg py pe ph pn pl pt pr qa re ro ru rw kn lc vc ws sm st sa sn sc sl sg sk si sb so za gs es lk sh pm sd sr sj sz se ch sy tw tj tz th tg tk to tt tn tr tm tc tv ug ua ae gb us um uy uz vu va ve vn vg vi wf eh ye yu zr zm zw); sub GetRandomDomain() { my $rindex; if (int (rand 4) == 0) { $rindex = int (rand ($#TlDomains2 + 1)); return $TlDomains2[$rindex]; } else { $rindex = int (rand ($#TlDomains1 + 1)); return $TlDomains1[$rindex]; } } sub GetSpambotPoison() { my ($result, $email_addr, $num_addresses); $num_addresses = 2 + int (rand 16); $result = '<div id=guestbook><h1>Guestbook</h1>'; for (1..$num_addresses) { $email_addr = &GetRandomWord() . '@'; if (int (rand 4) == 0) { $email_addr .= &GetRandomWord() . '.'; } $email_addr .= &GetRandomWord() . &GetRandomLetter() . "." . &GetRandomDomain(); $result .= "<A HREF=\"mailto:$email_addr\">$email_addr</A><BR>\n"; } $result .= '</div>'; return $result; } # SpambotPoision code ends
Also, in sub GetFooterText, add the following line near the end:
$result .= T($FooterNote); } $result .= '</div>'; + $result .= &GetSpambotPoison() if $SpambotPoison; $result .= &GetMinimumFooter(); return $result; }
Also, add the following to your site's StyleSheet:
#guestbook { display: none; }
This will hide the fake guestbook from your users.
What about getting a list of known spammer domains and using those as the domains? Also, you could also use "mailsiphon.com" (and maybe provide a link to that site), though I'll bet many spammers have excluded that domain. -- Trent
The hiding of the generated email addresses is done via CSS. All browsers not handling CSS still display these addresses. -- MarkusLude