[Home]WikiPatches/NoIndex

UseModWiki | WikiPatches | RecentChanges | Preferences

Note: This patch is superceded by WikiPatches/RobotsMetaTag

Once Google and other search engines index a page, they will continuously spider it forever. For pages like MeatBall:AdultCheck, this quickly becomes problematic as it attracts pornographers, vandals, and spammers. Thus, it's imperative to place a MeatBall:MetaTag in the header to inform well-behaved robots not to index the page.

The patch is very simple.

In GetHeader, add "$id" to the following line as indicated:

    $result .= &GetHtmlHeader("$SiteName: $title", $id);

In GetHtmlHeader, add the clause:

    if( $id && !-f &GetPageFile($id) ) {
        $html .= "<META NAME='robots' CONTENT='noindex,nocache,noarchive,nofollow'>\n";
    }

and change the line as indicated

  my ($title, $id) = @_;


It would be useful to add the same "robots" header to "diff" and "history" pages and old revisions also, as another anti-spam measure.


One alternative - others may point out why this is bad -- would be:

sub GetHtmlHeader

 ...
 $html .= "<HTML><HEAD><TITLE>$title</TITLE>\n";
 ## START PATCH 
 if ( ($ENV{'QUERY_STRING'} =~ /revision=\d+/) or ($ENV{'QUERY_STRING'} =~ /diff=1/) ) {
   $html .= "<META NAME='robots' CONTENT='noindex,nocache,noarchive,nofollow'>\n";
 }
 ## END PATCH
 if ($FavIcon ne '') {
 ...

This will catch both older revisions, and also diff pages I believe. To see what this would leave on the usemod site if implemented:

  http://www.google.com/search?hl=en&lr=&q=site%3Ausemod.com+-inurl%3Arevision%3D+-inurl%3Adiff%3D1&btnG=Search
  (around 54,000 pages)

versus including older versions and diffs

  http://www.google.com/search?hl=en&lr=&q=site%3Ausemod.com&btnG=Search
  (around 63,000 pages)

UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited July 3, 2014 7:56 pm by MarkusLude (diff)
Search: