# apt-get install htdig
/etc/htdig/wiki.conf database_dir: /var/lib/htdig start_url: http://...yoursite.../cgi-bin/wiki.pl?action=index limit_urls_to: ${start_url} http://...yoursite.../cgi-bin/wiki.pl exclude_urls: ~~nothing~~
Put this somehow in your crontab (probably for user root). Since we're starting on page "action=index", the htdig indexer has the option to limit how many "hops" it will take when crawling the website. It is important to set this number of hops to 1 so that only the actual text of each page will be indexed, rather than indexing all old revisions, the edit pages, etc. Search is a *very* difficult problem... think about it, how does google index these usemod wiki's?
# index the wiki (allow cgi-bin, wiki url's, etc, and # start at the action=index page with 1 level only) htdig -c /etc/htdig/wiki.conf -h 1 # merge found changes to with existing database htmerge
This involves patching the wiki. I only made changes to the "bottom" search form, because UseMod's internal search functionality does have a useful purpose for backlinks and exact matches. Unfortunately, Clifford had an "interesting" idea on how to start the <form action=...> part of this wiki. In order to be minimally invasive (to the 0.92 code anyway), I had to replace the sub GetSearchForm code with the following:
sub GetSearchForm { my $text = <<EOT </form> <!-- close the form that was started at top of page --> <form method="get" action="/cgi-bin/htsearch"> <!-- make sure this URL is right for you --> <div style="font-size: small;"> Search Wiki: <input type="hidden" name="restrict" value="wiki.pl"> <!-- only return links with wiki.pl --> <input type="text" size="30" name="words" value=""> <input type="submit" value="Search"> </div> </form> <form> EOT ; # make sure that the above "EOT" line is strictly # left-justified, (touching the left-hand margin) # as that is the way Perl needs it. return $text; }
...this is moderately hackish, but seems to work reasonably well.
any questions or updates, please contact me: ramses (zero) at yahoo.com
--Robert
I wanted to improve search capacity but HTDig is for Unix only. Finally I installed Perlfect, a free Perl based search engine, very easy to install.
You can get it at:
http://www.perlfect.com/freescripts/search/
Installing:
http://yourhost/cgi-bin/wiki.pl?RecentChanges http://yourhost/cgi-bin/wiki.pl?action=editprefs http://yourhost/cgi-bin/wiki.pl?action=edit&id=* http://yourhost/cgi-bin/wiki.pl?action=history&id=* http://yourhost/cgi-bin/wiki.pl?back=* http://yourhost/cgi-bin/wiki.pl?action=browse&diff=*
-- Albert.
Htdig for windows is available: http://www.htdig.org/files/binaries/ look for htdig316_nt.zip
-- Rog