Many of my users have pointed out that the page titles returned from a search (often) do not provide enough information to indicate if the information they're looking for is contained therein. So I've implemented a google-like search results page which provides some snippets of the document which contain the search text. The code is here for your amusement.
First, add the following to sub DoSearch?():
Q: Please, where to add in that sub?. The following snippet is fine, but where to include it on the already existing code on sub DoSearch??
A: You can safely insert it before the line &PrintPageList(&SearchTitleAndBody($string)); in sub DoSearch? and comment out the line &PrintPageList(&SearchTitleAndBody($string)); itself -- UrbanSheep
if ( $XSearchDisp ) { # managed by config file (?) &PrintSearchResults($string,&SearchTitleAndBody($string)) ; } else { &PrintPageList(&SearchTitleAndBody($string)); }
And here is sub PrintSearchResults?(): [updated 4/17 with better algorithm and fixes for JM, see below for details]
sub PrintSearchResults { my ( $searchstring, @results ) = @_ ; # inputs my ( $output ) ; my ( $name ) ; my ( $pageText ) ; my ( $t, $j, $jsnippet, $start, $end ) ; my ( $snippetlen, $maxsnippets ) = ( 100, 4 ) ; # these seem nice. print "\n<h2>", ($#results + 1), " pages found:</h2>"; foreach $name (@results) { # get the page, filter it, remove all tags (since we're presenting in # plaintext, not HTML, a la google(tm)). &OpenPage($name); &OpenDefaultText(); $pageText = &QuoteHtml($Text{'text'}); $pageText =~ s/$FS//g; # Remove separators (paranoia) $pageText =~ s/[\s]+/ /g; # Shrink whitespace $pageText =~ s/([-_=\\*\\.]){10,}/$1$1$1$1$1/g ; # e.g. shrink "----------" foreach $t (@HtmlPairs, "pre", "nowiki", "code" ) { $pageText =~ s/\<$t(\s[^<>]+?)?\>(.*?)\<\/$t\>/$2/gis; } foreach $t (@HtmlSingle) { $pageText =~ s/\<$t(\s[^<>]+?)?\>//gi; } # entry header $output = "\n\n" ; $output .= "... " if ($name =~ m|/|); $output .= "<font size=+1>" . &GetPageLink($name) . "</font><br>\n" ; # show a snippet from the top of the document $j = index( $pageText, " ", $snippetlen ) ; # end on word boundary $output .= substr( $pageText, 0, $j ) . " <b>...</b> " ; $pageText = substr( $pageText, $j ) ; # to avoid rematching # search for occurrences of searchstring $jsnippet = 0 ; while ( $jsnippet < $maxsnippets && $pageText =~ m/($searchstring)/i ) { # captures match as $1 $jsnippet++ ; # paranoid about looping if ( ($j = index( $pageText, $1 )) > -1 ) { # get index of match # get substr containing (start of) match, ending on word boundaries $start = index( $pageText, " ", $j-($snippetlen/2) ) ; $start = 0 if ( $start == -1 ) ; $end = index( $pageText, " ", $j+($snippetlen/2) ) ; $end = length( $pageText ) if ( $end == -1 ) ; $t = substr( $pageText, $start, $end-$start ) ; # highlight occurrences and tack on to output stream. $t =~ s/($searchstring)/<b>\1<\/b>/gi ; $output .= $t . " <b>...</b> " ; # truncate text to avoid rematching the same string. $pageText = substr( $pageText, $end ) ; } } # entry trailer $output .= "<br><i><font size=-1>" . int((length($pageText)/1024)+1) . "K - last updated " . &TimeToText($Section{ts}) . " by " . &GetAuthorLink( $Section{'host'}, $Section{'username'}, $Section{'id'} ) . "</font></i><br><br>" ; print $output ; } }
Then add initialisation of $XSearchDisp? into the config section under Major options:
$XSearchDisp = 1; # 1 = extra text output on search, 0 = normal search output
Then add the $XSearchOutput? variable itself into the list under Configuration/constant variables.
Comments, critiques, etc. are welcome. Particularly if anyone can come up with a better way to do the searching (and grabbing of scalar substrings). --MikeDalessio
hide.*source
or maybe back.*button
and check the results! I detuned the font sizes (h2 > h3, no font size +1) to get a denser page, but Mike did all the hard work. Wonderful! -- JerryMuelver
Include_one_file_in_another
becomes <b>Include</b>_one_file_in_another
which gives a 404. See http://allmyfaqs.com/cgi-bin/wiki.pl?HTML_FAQs and search for "include", scroll to "Include one file in another". -- JerryMuelver
$output = &GetPageLink($name) . "
\n";
$output =~ s/($searchstring)/<b>$1<\/b>/gi ; $output = &GetPageLink($name) . "
\n" . $output;
Blush ... Glad you like it. Thanks much for the bugfix (I must have slipped up when my caffeine levels fell below normal ;). I'm still not completely happy with how the searching is being done (i.e., via the index() function), but I can't think of any other way to do it so that we can subsequently grab a substring. My momma told me that, in perl, There's More Than One Way To Do It, so somebody must have a better idea. Bueller? Bueller? --MikeDalessio
Improved algorithm - New code above supports regexps correctly, using a one-two combo of m// and index(). It also addresses JM's highlighting bug. Now I'm happy with it. --MikeDalessio
Small fix: To avoid html errors one should change
$output .= "<font size=+1>" . &GetPageLink($name) . "</font><br>\n" ;to
$output .= "<font size=\"+1\">" . &GetPageLink($name) . "</font><br>\n" ;- Richard
How about showing the last header before the match ?
That could also be nice for the index-page (in this case, showing the first header from the page)...
--HaJoGurt
Implementation:
shortresult
to control output. Google like output is shown if this variable is missing.
/SearchWithOperators has the code for the OddMuse search which allows "and" and "or" operators, and which adapts the patch on this page for appropriate highlighting. -- AlexSchroeder
BUT: Here it is: (I use the known T(..) and Ts(..) functions.
sub PrintSearchResults { my ( $searchstring, @results ) = @_ ; # inputs my ( $output ) ; my ( $name ) ; my ( $pageText ) ; my ( $t, $j, $jsnippet, $start, $end ) ; my ( $snippetlen, $maxsnippets ) = ( 100, 4 ) ; # these seem nice. # print "\n<h2>", ($#results + 1), " ", T('pages found:'), "</h2>"; print "\n<h2>", Ts('%s pages found:', ($#results + 1)), "</h2>"; foreach $name (@results) { # get the page, filter it, remove all tags (since we're presenting in # plaintext, not HTML, a la google(tm)). &OpenPage($name); &OpenDefaultText(); $pageText = &QuoteHtml($Text{'text'}); $pageText =~ s/$FS//g; # Remove separators (paranoia) $pageText =~ s/[\s]+/ /g; # Shrink whitespace $pageText =~ s/([-_=\\*\\.]){10,}/$1$1$1$1$1/g ; # e.g. shrink "----------" foreach $t (@HtmlPairs, "pre", "nowiki", "code" ) { $pageText =~ s/\<$t(\s[^<>]+?)?\>(.*?)\<\/$t\>/$2/gis; } foreach $t (@HtmlSingle) { $pageText =~ s/\<$t(\s[^<>]+?)?\>//gi; } # entry header $output = "\n\n" ; $output .= "... " if ($name =~ m|/|); $output .= "<b>" . &GetPageLink($name) . "</b><br>\n" ; # show a snippet from the top of the document $j = index( $pageText, " ", $snippetlen ) ; # end on word boundary $output .= substr( $pageText, 0, $j ) . " <b>...</b> " ; $pageText = substr( $pageText, $j ) ; # to avoid rematching # search for occurrences of searchstring $jsnippet = 0 ; while ( $jsnippet < $maxsnippets && $pageText =~ m/($searchstring)/i ) { # captures match as $1 $jsnippet++ ; # paranoid about looping if ( ($j = index( $pageText, $1 )) > -1 ) { # get index of match # get substr containing (start of) match, ending on word boundaries $start = index( $pageText, " ", $j-($snippetlen/2) ) ; $start = 0 if ( $start == -1 ) ; $end = index( $pageText, " ", $j+($snippetlen/2) ) ; $end = length( $pageText ) if ( $end == -1 ) ; $t = substr( $pageText, $start, $end-$start ) ; # highlight occurrences and tack on to output stream. $t =~ s/($searchstring)/<b style='background: #FFFFCC'>\1<\/b>/gi ; $output .= $t . " <b>...</b> " ; # truncate text to avoid rematching the same string. $pageText = substr( $pageText, $end ) ; } } # entry trailer $output .= "<br><i><font color=gray>" . int((length($pageText)/1024)+1) . "K - " . T('Last edited') . " " . &TimeToText($Section{ts}) . " " . T('by') . " " . &GetAuthorLink( $Section{'host'}, $Section{'username'}, $Section{'id'} ) . "</font></i><br><br>" ; print $output ; } }--KnutK