Once Google and other search engines index a page, they will continuously spider it forever. For pages like MeatBall:AdultCheck, this quickly becomes problematic as it attracts pornographers, vandals, and spammers. Thus, it's imperative to place a MeatBall:MetaTag in the header to inform well-behaved robots not to index the page.
The patch is very simple.
In GetHeader, add "$id" to the following line as indicated:
$result .= &GetHtmlHeader("$SiteName: $title", $id);
In GetHtmlHeader, add the clause:
if( $id && !-f &GetPageFile($id) ) { $html .= "<META NAME='robots' CONTENT='noindex,nocache,noarchive,nofollow'>\n"; }
and change the line as indicated
my ($title, $id) = @_;
It would be useful to add the same "robots" header to "diff" and "history" pages and old revisions also, as another anti-spam measure.
One alternative - others may point out why this is bad -- would be:
sub GetHtmlHeader
... $html .= "<HTML><HEAD><TITLE>$title</TITLE>\n"; ## START PATCH if ( ($ENV{'QUERY_STRING'} =~ /revision=\d+/) or ($ENV{'QUERY_STRING'} =~ /diff=1/) ) { $html .= "<META NAME='robots' CONTENT='noindex,nocache,noarchive,nofollow'>\n"; } ## END PATCH if ($FavIcon ne '') { ...
This will catch both older revisions, and also diff pages I believe. To see what this would leave on the usemod site if implemented:
http://www.google.com/search?hl=en&lr=&q=site%3Ausemod.com+-inurl%3Arevision%3D+-inurl%3Adiff%3D1&btnG=Search (around 54,000 pages)
versus including older versions and diffs
http://www.google.com/search?hl=en&lr=&q=site%3Ausemod.com&btnG=Search (around 63,000 pages)