This patch adds a robots meta tag to every page generated by a UseMod wiki. It updates the /NoIndex patch from 0.92 to 1.0, incorporates the features of the /RobotsNoFollow patch, and includes a few tweaks to make both patches more search engine friendly.
The purpose of this patch is to mark certain pages as off-limits to search engines in order to discourage spammers. In particular, this patch reduces the effectiveness of SEO spamming, that is, spammers who insert WikiSpam in order to increase their rank in Google. As it stands in UseMod 1.0, reverting changes made by a spammer to a well-indexed wiki has little effect on the PageRank? bonus acquired by the spammer since a link in a historical revision of a page accessed via the page history is nearly as effective as a link in the current revision. This feature is largely based on the /RobotsNoFollow patch.
This patch also includes the functionality of the 0.92 /NoIndex patch. Once Google and other search engines index a page, they will continuously spider it forever. Pages that have been deleted but still appear in search engine results for attractive key words can attract pornographers, vandals, and spammers long after the page has been deleted.
Additionally, this patch slightly tweaks the earlier two patches to make the resulting patch more search engine friendly. Changes include:
-- RichardP
I think this is a bare minimum anti-spam tactic. This should be included in the default install ASAP. The reason I say this is... The patch is unlikely to have any noticable effect upon any one particular wiki where it is installed. Spammers are generally too stupid to know the difference between one wiki and another. However they may start to notice if all usemod wikis (or a significant proportion) have these metatags. If spammers start to give up, then we all benefit, but at the moment the problem is only going to get worse.
As usemod fans will know, this wiki software is very popular particularly because it is easy to install. The result is that there are many usemod administrators all over the internet who don't have the technical skills or the inclination to go about applying patches which have no obvious benefits. Which is why I think it is important that robots meta tags should be included in the default usemod install. -- Halz 18th Jan 2005
I applied this patch. It allows robots to index and follow old revisions. Dito. with embedded versions (embed=1). Should't it deny these two cases? --StefanTrcek
Hi just found this after adding a second patch at WikiPatches/RobotsNoFollow. Mine is a bit smaller of a change, but has the same goal in mind. There shouldn't be any trickery done to get the useful content indexed. I also made it a feature you can turn on or off, and only affects diff or history pages. Finally, it doesn't require changing arguments passed to functions. Let me know which patch fixes the problem better. --TomScanlan
( from WikiSpam/StopIt ): I just added a third implementation at WikiPatches/RobotsNoFollow. I didn't see RichardP's shot at it. If mine isn't useful kill it and go with his as it is better than the first WikiPatches/RobotsNoFollow patch. --TomScanlan
What do you think of the idea in Wiki:WikiWikiSystemNotice? Would this be a good standard to add to the UseMod base code? If anyone thinks it is a good idea, how would one implement it in UseMod? --PaulMorrison
--- wiki.pl 2003-09-11 05:21:02.000000000 -0800 +++ wiki-robotsmeta.pl 2004-12-09 03:10:00.000000000 -0800 @@ -1291,7 +1291,7 @@ if ($FreeLinks) { $title =~ s/_/ /g; # Display as spaces } - $result .= &GetHtmlHeader("$SiteName: $title"); + $result .= &GetHtmlHeader("$SiteName: $title", $id); return $result if ($embed); $result .= '<div class=wikiheader>'; @@ -1342,7 +1342,7 @@ } sub GetHtmlHeader { - my ($title) = @_; + my ($title, $id) = @_; my ($dtd, $html, $bodyExtra, $stylesheet); $html = ''; @@ -1367,6 +1367,23 @@ if ($stylesheet ne '') { $html .= qq(<LINK REL="stylesheet" HREF="$stylesheet">\n); } + # Use a ROBOTS meta tag with INDEX,FOLLOW on just the current version + # of wiki pages so that a robot doesn't index historical versions of pages. + # Use INDEX,FOLLOW tag for RecentChanges and the index of all pages so + # that these pages are followed into the database. Non-existant wiki + # pages get NOINDEX,NOFOLLOW so that they are not indexed. + if (($id eq $RCName) || (T($RCName) eq $id) || (T($id) eq $RCName)) { + $html .= qq(<meta name="robots" content="index,follow">\n); + } elsif (lc(GetParam('action', '')) eq 'index') { + $html .= qq(<meta name="robots" content="index,follow">\n); + } elsif ($id eq '') { + $html .= qq(<meta name="robots" content="noindex,nofollow">\n); + } elsif ( $id && !-f &GetPageFile($id) ) { + $html .= qq(<meta name="robots" content="noindex,nofollow">\n); + } else { + $html .= qq(<meta name="robots" content="index,follow">\n); + } + #finish $html .= $UserHeader; $bodyExtra = ''; if ($UserBody ne '') {
*** wiki.pl 2003-09-11 05:21:02.000000000 -0800 --- wiki-robotsmeta.pl 2004-12-09 03:10:00.000000000 -0800 *************** *** 1291,1297 **** if ($FreeLinks) { $title =~ s/_/ /g; # Display as spaces } ! $result .= &GetHtmlHeader("$SiteName: $title"); return $result if ($embed); $result .= '<div class=wikiheader>'; --- 1291,1297 ---- if ($FreeLinks) { $title =~ s/_/ /g; # Display as spaces } ! $result .= &GetHtmlHeader("$SiteName: $title", $id); return $result if ($embed); $result .= '<div class=wikiheader>'; *************** *** 1342,1348 **** } sub GetHtmlHeader { ! my ($title) = @_; my ($dtd, $html, $bodyExtra, $stylesheet); $html = ''; --- 1342,1348 ---- } sub GetHtmlHeader { ! my ($title, $id) = @_; my ($dtd, $html, $bodyExtra, $stylesheet); $html = ''; *************** *** 1367,1372 **** --- 1367,1389 ---- if ($stylesheet ne '') { $html .= qq(<LINK REL="stylesheet" HREF="$stylesheet">\n); } + # Use a ROBOTS meta tag with INDEX,FOLLOW on just the current version + # of wiki pages so that a robot doesn't index historical versions of pages. + # Use INDEX,FOLLOW tag for RecentChanges and the index of all pages so + # that these pages are followed into the database. Non-existant wiki + # pages get NOINDEX,NOFOLLOW so that they are not indexed. + if (($id eq $RCName) || (T($RCName) eq $id) || (T($id) eq $RCName)) { + $html .= qq(<meta name="robots" content="index,follow">\n); + } elsif (lc(GetParam('action', '')) eq 'index') { + $html .= qq(<meta name="robots" content="index,follow">\n); + } elsif ($id eq '') { + $html .= qq(<meta name="robots" content="noindex,nofollow">\n); + } elsif ( $id && !-f &GetPageFile($id) ) { + $html .= qq(<meta name="robots" content="noindex,nofollow">\n); + } else { + $html .= qq(<meta name="robots" content="index,follow">\n); + } + #finish $html .= $UserHeader; $bodyExtra = ''; if ($UserBody ne '') {
(I tried, but had software error: Global symbol "$id" requires explicit package name at wiki.pl, beyond my knowledge what it mean) DanKoehl
A: Take a look at WikiPatches/RobotsNoFollow
You will have to change the following lines too:
sub GetHeader { [...] $result .= &GetHtmlHeader("$SiteName: $title",$id); sub GetHtmlHeader { my ($title,$id) = @_; [...]BernhardZechmann
*************** *** 1367,1372 **** --- 1367,1389 ---- if ($stylesheet ne '') { $html .= qq(<LINK REL="stylesheet" HREF="$stylesheet">\n); } + my $bots=''; + # actions and non-existant page views don't get indexed or followed by robots + if ( ($id eq '') || ( $id && !-f &GetPageFile($id) ) ) { $bots = "no"; } + $bots = $bots . 'index,' . $bots . 'follow'; + $html .= qq(<meta name="robots" content="$bots" />\n); $html .= $UserHeader; $bodyExtra = ''; if ($UserBody ne '') {-AdamKatz