[Home]WikiPatches/RobotsMetaTag

UseModWiki | WikiPatches | RecentChanges | Preferences

Patch Description

This patch adds a robots meta tag to every page generated by a UseMod wiki. It updates the /NoIndex patch from 0.92 to 1.0, incorporates the features of the /RobotsNoFollow patch, and includes a few tweaks to make both patches more search engine friendly.

The purpose of this patch is to mark certain pages as off-limits to search engines in order to discourage spammers. In particular, this patch reduces the effectiveness of SEO spamming, that is, spammers who insert WikiSpam in order to increase their rank in Google. As it stands in UseMod 1.0, reverting changes made by a spammer to a well-indexed wiki has little effect on the PageRank? bonus acquired by the spammer since a link in a historical revision of a page accessed via the page history is nearly as effective as a link in the current revision. This feature is largely based on the /RobotsNoFollow patch.

This patch also includes the functionality of the 0.92 /NoIndex patch. Once Google and other search engines index a page, they will continuously spider it forever. Pages that have been deleted but still appear in search engine results for attractive key words can attract pornographers, vandals, and spammers long after the page has been deleted.

Additionally, this patch slightly tweaks the earlier two patches to make the resulting patch more search engine friendly. Changes include:

-- RichardP


Comments

I have linked to this patch from the WikiSpam page.

I think this is a bare minimum anti-spam tactic. This should be included in the default install ASAP. The reason I say this is... The patch is unlikely to have any noticable effect upon any one particular wiki where it is installed. Spammers are generally too stupid to know the difference between one wiki and another. However they may start to notice if all usemod wikis (or a significant proportion) have these metatags. If spammers start to give up, then we all benefit, but at the moment the problem is only going to get worse.

As usemod fans will know, this wiki software is very popular particularly because it is easy to install. The result is that there are many usemod administrators all over the internet who don't have the technical skills or the inclination to go about applying patches which have no obvious benefits. Which is why I think it is important that robots meta tags should be included in the default usemod install. -- Halz 18th Jan 2005


I applied this patch. It allows robots to index and follow old revisions. Dito. with embedded versions (embed=1). Should't it deny these two cases? --StefanTrcek

Stefan, with regards to old revisions, since the history page is marked as 'nofollow' a search engine shouldn't discover links to old revisions. Similarly, with regards to embedded versions, since no page generated by the wiki links to them, they won't be discovered by a search engine. In both cases these pages won't be indexed unless the search engine allready knows about these links, someone explicitly links to them from another site, or the admin deliberately submits them to the engine. In each of these cases I thought they probably should be indexed. Do you disagree? On the other hand, it does make sense that someone with a pre-existing wiki might want to stop search engines from indexing links to old revisions that the engines already know about. It wouldn't be hard to modify the patch to explicity deny these requests. What do you think? -- RichardP

Ahh, I see that makes sense to me. As I build up the complete search index nightly this doesn't harm. --StefanTrcek

It would be better if old revisions were explicitly NOINDEXed I think. People discussing a spam problem might like to link to an old revision, without accidentally showing it to search engines. I guess its a subtle point though, and wouldn't make much difference either way. Poeple might equally be discussing (and linking to) an old revision with legitimate content, but on the whole I think if content is worth discussing it tends to get rescued and put back on a main article/discussion page somewhere. -- Halz 19th Jan 2005


Hi just found this after adding a second patch at WikiPatches/RobotsNoFollow. Mine is a bit smaller of a change, but has the same goal in mind. There shouldn't be any trickery done to get the useful content indexed. I also made it a feature you can turn on or off, and only affects diff or history pages. Finally, it doesn't require changing arguments passed to functions. Let me know which patch fixes the problem better. --TomScanlan

( from WikiSpam/StopIt ): I just added a third implementation at WikiPatches/RobotsNoFollow. I didn't see RichardP's shot at it. If mine isn't useful kill it and go with his as it is better than the first WikiPatches/RobotsNoFollow patch. --TomScanlan


What do you think of the idea in Wiki:WikiWikiSystemNotice? Would this be a good standard to add to the UseMod base code? If anyone thinks it is a good idea, how would one implement it in UseMod? --PaulMorrison


Patch code (unified output format)

--- wiki.pl	2003-09-11 05:21:02.000000000 -0800
+++ wiki-robotsmeta.pl	2004-12-09 03:10:00.000000000 -0800
@@ -1291,7 +1291,7 @@
   if ($FreeLinks) {
     $title =~ s/_/ /g;   # Display as spaces
   }
-  $result .= &GetHtmlHeader("$SiteName: $title");
+  $result .= &GetHtmlHeader("$SiteName: $title", $id);
   return $result  if ($embed);
 
   $result .= '<div class=wikiheader>';
@@ -1342,7 +1342,7 @@
 }
 
 sub GetHtmlHeader {
-  my ($title) = @_;
+  my ($title, $id) = @_;
   my ($dtd, $html, $bodyExtra, $stylesheet);
 
   $html = '';
@@ -1367,6 +1367,23 @@
   if ($stylesheet ne '') {
     $html .= qq(<LINK REL="stylesheet" HREF="$stylesheet">\n);
   }
+  # Use a ROBOTS meta tag with INDEX,FOLLOW on just the current version
+  # of wiki pages so that a robot doesn't index historical versions of pages.  
+  # Use INDEX,FOLLOW tag for RecentChanges and the index of all pages so
+  # that these pages are followed into the database.  Non-existant wiki
+  # pages get NOINDEX,NOFOLLOW so that they are not indexed.
+  if (($id eq $RCName) || (T($RCName) eq $id) || (T($id) eq $RCName)) {
+    $html .= qq(<meta name="robots" content="index,follow">\n);
+  } elsif (lc(GetParam('action', '')) eq 'index') {
+    $html .= qq(<meta name="robots" content="index,follow">\n);
+  } elsif ($id eq '') {
+    $html .= qq(<meta name="robots" content="noindex,nofollow">\n);
+  } elsif ( $id && !-f &GetPageFile($id) ) {
+    $html .= qq(<meta name="robots" content="noindex,nofollow">\n);
+  } else {
+    $html .= qq(<meta name="robots" content="index,follow">\n);
+  }
+  #finish
   $html .= $UserHeader;
   $bodyExtra = '';
   if ($UserBody ne '') {

Patch code (context output format)

*** wiki.pl	2003-09-11 05:21:02.000000000 -0800
--- wiki-robotsmeta.pl	2004-12-09 03:10:00.000000000 -0800
***************
*** 1291,1297 ****
    if ($FreeLinks) {
      $title =~ s/_/ /g;   # Display as spaces
    }
!   $result .= &GetHtmlHeader("$SiteName: $title");
    return $result  if ($embed);
  
    $result .= '<div class=wikiheader>';
--- 1291,1297 ----
    if ($FreeLinks) {
      $title =~ s/_/ /g;   # Display as spaces
    }
!   $result .= &GetHtmlHeader("$SiteName: $title", $id);
    return $result  if ($embed);
  
    $result .= '<div class=wikiheader>';
***************
*** 1342,1348 ****
  }
  
  sub GetHtmlHeader {
!   my ($title) = @_;
    my ($dtd, $html, $bodyExtra, $stylesheet);
  
    $html = '';
--- 1342,1348 ----
  }
  
  sub GetHtmlHeader {
!   my ($title, $id) = @_;
    my ($dtd, $html, $bodyExtra, $stylesheet);
  
    $html = '';
***************
*** 1367,1372 ****
--- 1367,1389 ----
    if ($stylesheet ne '') {
      $html .= qq(<LINK REL="stylesheet" HREF="$stylesheet">\n);
    }
+   # Use a ROBOTS meta tag with INDEX,FOLLOW on just the current version
+   # of wiki pages so that a robot doesn't index historical versions of pages.  
+   # Use INDEX,FOLLOW tag for RecentChanges and the index of all pages so
+   # that these pages are followed into the database.  Non-existant wiki
+   # pages get NOINDEX,NOFOLLOW so that they are not indexed.
+   if (($id eq $RCName) || (T($RCName) eq $id) || (T($id) eq $RCName)) {
+     $html .= qq(<meta name="robots" content="index,follow">\n);
+   } elsif (lc(GetParam('action', '')) eq 'index') {
+     $html .= qq(<meta name="robots" content="index,follow">\n);
+   } elsif ($id eq '') {
+     $html .= qq(<meta name="robots" content="noindex,nofollow">\n);
+   } elsif ( $id && !-f &GetPageFile($id) ) {
+     $html .= qq(<meta name="robots" content="noindex,nofollow">\n);
+   } else {
+     $html .= qq(<meta name="robots" content="index,follow">\n);
+   }
+   #finish
    $html .= $UserHeader;
    $bodyExtra = '';
    if ($UserBody ne '') {


Seems agood idea, but is this patch also useable for 0.92?

(I tried, but had software error: Global symbol "$id" requires explicit package name at wiki.pl, beyond my knowledge what it mean) DanKoehl

A: Take a look at WikiPatches/RobotsNoFollow

You will have to change the following lines too:

sub GetHeader {
  [...]
  $result .= &GetHtmlHeader("$SiteName: $title",$id);

sub GetHtmlHeader {
  my ($title,$id) = @_;
  [...]
BernhardZechmann


Cleaner code: Make "index,follow" default and specify all the cases for "noindex,nofollow" in one statement.
***************
*** 1367,1372 ****
--- 1367,1389 ----
    if ($stylesheet ne '') {
      $html .= qq(<LINK REL="stylesheet" HREF="$stylesheet">\n);
    }
+   my $bots='';
+   # actions and non-existant page views don't get indexed or followed by robots
+   if ( ($id eq '') || ( $id && !-f &GetPageFile($id) ) ) { $bots = "no"; }
+   $bots = $bots . 'index,' . $bots . 'follow';
+   $html .= qq(<meta name="robots" content="$bots" />\n);
    $html .= $UserHeader;
    $bodyExtra = '';
    if ($UserBody ne '') {
-AdamKatz

UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited October 2, 2007 6:36 pm by MarkusLude (diff)
Search: