[Home]WikiPatches/HTDigSearching

UseModWiki | WikiPatches | RecentChanges | Preferences

Assuming you're using debian (because that's what I'm using):

Install ht://Dig

 # apt-get install htdig

Make a "wiki" configuration for ht://Dig

Copy the default htdig.conf to a new one (like wiki.conf). These changes are necessary because:

  1. You need to give htdig a place to start.
  2. limit_urls needs to know that wiki.pl is ok, not just the action=index branch.
  3. exclude_urls is normally set to exclude .cgi files and the cgi-bin directory.
  4. we want to remove that behaviour in order to index the wiki.

 /etc/htdig/wiki.conf
 database_dir:           /var/lib/htdig
 start_url:              http://...yoursite.../cgi-bin/wiki.pl?action=index
 limit_urls_to:          ${start_url} http://...yoursite.../cgi-bin/wiki.pl
 exclude_urls:           ~~nothing~~

Set up the 'htdig' indexer

Put this somehow in your crontab (probably for user root). Since we're starting on page "action=index", the htdig indexer has the option to limit how many "hops" it will take when crawling the website. It is important to set this number of hops to 1 so that only the actual text of each page will be indexed, rather than indexing all old revisions, the edit pages, etc. Search is a *very* difficult problem... think about it, how does google index these usemod wiki's?

 # index the wiki (allow cgi-bin, wiki url's, etc, and
 # start at the action=index page with 1 level only)
 htdig -c /etc/htdig/wiki.conf -h 1
 # merge found changes to with existing database
 htmerge

Make UseMod use the ht://Dig search database

This involves patching the wiki. I only made changes to the "bottom" search form, because UseMod's internal search functionality does have a useful purpose for backlinks and exact matches. Unfortunately, Clifford had an "interesting" idea on how to start the <form action=...> part of this wiki. In order to be minimally invasive (to the 0.92 code anyway), I had to replace the sub GetSearchForm code with the following:

 sub GetSearchForm {
   my $text = <<EOT
   </form>   <!-- close the form that was started at top of page -->
   <form method="get" action="/cgi-bin/htsearch">  <!-- make sure this URL is right for you -->
   <div style="font-size: small;">
   Search Wiki:
   <input type="hidden" name="restrict" value="wiki.pl">   <!-- only return links with wiki.pl -->
   <input type="text" size="30" name="words" value="">
   <input type="submit" value="Search">
   </div>
   </form>
   <form>
 EOT
 ;
   # make sure that the above "EOT" line is strictly 
   # left-justified, (touching the left-hand margin)
   # as that is the way Perl needs it.
   return $text;
 }

...this is moderately hackish, but seems to work reasonably well.

any questions or updates, please contact me: ramses (zero) at yahoo.com

--Robert

Windows Alternative

I wanted to improve search capacity but HTDig is for Unix only. Finally I installed Perlfect, a free Perl based search engine, very easy to install.

You can get it at:

http://www.perlfect.com/freescripts/search/

Installing:

 http://yourhost/cgi-bin/wiki.pl?RecentChanges
 http://yourhost/cgi-bin/wiki.pl?action=editprefs
 http://yourhost/cgi-bin/wiki.pl?action=edit&id=*
 http://yourhost/cgi-bin/wiki.pl?action=history&id=*
 http://yourhost/cgi-bin/wiki.pl?back=*
 http://yourhost/cgi-bin/wiki.pl?action=browse&diff=*

-- Albert.

Htdig for windows is available: http://www.htdig.org/files/binaries/ look for htdig316_nt.zip

-- Rog


UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited November 13, 2016 10:00 pm by MarkusLude (diff)
Search: