[Home]WikiPatches/PerlDiff

UseModWiki | WikiPatches | RecentChanges | Preferences

This patch implements a replacement of the built-in document version diff.

This patch and the accompanying files have been originally created for the UnrealWiki and are currently in use there. (In fact, UnrealWiki uses a slightly enhanced version that allows for a couple of per-user settings related to diff output. That's a separate patch on top of this one though.)

I've confirmed this patch to work as advertised on an unaltered installation of UseModWiki 0.92 (and our own heavily patched UnrealWiki before, of course). --MichaelBuschbeck

I've installed this mod into DomesuWiki. Excellent patch, which I prefer over Algorithm::Diff, since it's more self-contained, and It's some of the prettiest perl code I've ever looked at. -- ChuckAdams


Patch Instructions

Download http://mb.link-m.de/download/wiki-perldiff.zip. This archive contains the following two files; copy them into the directory where wiki.cgi resides.

Note: The text files in this archive use Windows linebreaks, not Unix ones. Recode them accordingly before you use them on systems that expect Unix linebreaks or use text mode when you transfer them via FTP from a Windows system to a web server running a Unix-derived system.

Under Unix I used it the following commands:

 unzip wiki-perldiff.zip
 dos2unix -ascii Diff.pm
 dos2unix -ascii wiki-update-diff.cgi
 chmod ugo+r Diff.pm
 chmod ugo+r wiki-update-diff.cgi

 { adapt the statement $wiki = 'wiki.cgi'; to the correct location of your wiki.pl }

 { run wiki.pl in webbrowser as admin and lock site }

 cp -r wikidatabase wikidatabase.backup
 patch wiki.pl PerlDiff.diff

 { run wiki-update-diff.cgi in webbrowser }

 { check the patched wiki.pl for the correct "use Diff;" statement }

 { Now, everything should work. TomGries, 14.01.2004 }


Add use Diff; above the package UseModWiki; statement.

The diff function in that Diff module of mine is admittedly a reinvented wheel (see Wiki:ReinventingTheWheel), but I knew or at least expected it before I even started coding it. The Algorithm::Diff module available from http://www.cpan.org serves the same purpose. You might consider my venture a display of Hubris along the lines of Wiki:LazinessImpatienceHubris, even though it lacks a considerable amount of Impatience and Laziness (or I'd simply have used said Algorithm::Diff). I just found it an interesting problem to think about. --MichaelBuschbeck


Replace the sub GetDiff by the code given below.

The %format hash specifies details about the diff output's formatting. Its elements paraIdent etc. are templates and parameters; see the comment header of the diffText function in Diff.pm for details of their meaning. The format below is set up for being formatted with the help of CSS.

Note that the CSS classes "diff-para-changed-old", "diff-para-changed-new", "diff-span-added" and "diff-span-deleted" in are required by the code following the Diff::diffText call to postprocess the diff output. Don't alter the CSS markup in the templates, even if you don't use CSS (it won't hurt).

sub GetDiff {

  my $textOld = shift;
  my $textNew = shift;

  my %format = (
    paraIdent     => '<tr valign=top><td class="diff-para-ident"><p>%text%</p></td><td></td><td class="diff-para-ident"><p>%text%</p></td></tr>',
    paraAdded     => '<tr valign=top><td class="diff-para-ident"></td><td></td><td class="diff-para-added"><p>%text%</p></td></tr>',
    paraDeleted   => '<tr valign=top><td class="diff-para-deleted"><p>%text%</p></td><td></td><td class="diff-para-ident"></td></tr>',
    paraChanged   => '<tr valign=top><td class="diff-para-changed-old"><p>%text%</p></td><td></td><td class="diff-para-changed-new"><p>%text%</p></td></tr>',
    paraReplaced  => '<tr valign=top><td class="diff-para-deleted"><p>%textDeleted%</p></td><td></td><td class="diff-para-added"><p>%textAdded%</p></td></tr>',
  
    changeContext => 1,
    changeHeader  => '<tr valign=top><td class="diff-header" width="48%">Paragraph %oldFrom%</td><td width="4%"></td><td class="diff-header" width="48%">Paragraph %newFrom%</td></tr>',
  
    spanIdent     => '<span class="diff-span-ident">%text%</span>',
    spanAdded     => '<span class="diff-span-added">%text%</span>',
    spanDeleted   => '<span class="diff-span-deleted">%text%</span>',

    processText => sub {

      my $text = shift;

      $text =~ s[&]               [&amp;]g;
      $text =~ s[<]               [&lt;]g;
      $text =~ s[>]               [&gt;]g;
      $text =~ s[\n]              [<br>\n]g;
      $text =~ s[\r]              []g;
      $text =~ s[([\t ]+)([\t ])] [('&nbsp;' x length($1)) . $2]ge;
      $text =~ s[^[\t ]]          [&nbsp;];

      return $text;
    }
  );

  my $diff = Diff::diffText($textOld, $textNew, %format);
  
  if ($diff ne '') {
    $diff =~ s[<td class="diff-para-changed-old">(.*?)</td>] [
      my $textChanged = $1;
      $textChanged =~ s[<span class="diff-span-added">.*?</span>] []gs;
      qq[<td class="diff-para-changed">$textChanged</td>];
    ]ges;
    
    $diff =~ s[<td class="diff-para-changed-new">(.*?)</td>] [
      my $textChanged = $1;
      $textChanged =~ s[<span class="diff-span-deleted">.*?</span>] []gs;
      qq[<td class="diff-para-changed">$textChanged</td>];
    ]ges;
  
    $diff = qq[<table width="100%" border=0 cellspacing=0 cellpadding=0>$diff</table>];
  }

  return $diff;
}


Replace the sub DiffToHTML by the following dummy code:

sub DiffToHTML { shift }

This is the place where you could take care of per-user options. Over at the UnrealWiki, we're applying the user's settings for "Strike through/underline markup" and "Show paragraph marks" here. (That's an additional patch; I want to keep this one tight.)


Remove the sub ColorDiff entirely; this patch makes it obsolete.


Patch the sub UpdateDiffs as follows:

if ($UseDiffLog) {
  my $editDiff = Diff::diffClassic($old, $new);  # add this line
  &WriteDiff($id, $editTime, $editDiff);
}

Post-Patch Instructions

After you have applied this patch, the Wiki database still contains the cached old diffs which will produce garbled output then. In order to update the diff caches in your Wiki database, run the Perl script wiki-update-diff.cgi from your web browser.

You might have to modify the $wiki = 'wiki.cgi'; statement near the top of the script to accommodate the actual file name of your UseModWiki script.

For security reasons, the script will only execute when a global Wiki lock is in place. It's strongly advised that you make a full backup of your Wiki data before executing this script. Use at your own risk. It worked fine for me and I created it to the best of my knowledge, but I can't guarantee that it'll work just as well with your Wiki and I'd prefer not to be held responsible in the (unlikely) case of data loss. Make a backup, just to be on the safe side.

Since the script has to get hold of the actual previous document versions to create the respective diffs, it can't create those diffs that relate to expired old versions. Hence, some old diffs may be lost during the process. (The script tells you how many and which ones.)

Diff Display CSS

The new diff output is best being formatted using CSS. (Some browsers, namely Netscape 4.x, have severe issues with CSS. It's probably a good idea to extend the CSS-only formatting with <span> by old-school <font color> or <u>/<s> tags.)

Here are the CSS classes needed to format the diff output:

.diff-header       { font-weight: bold }  /* headers displaying paragraph indices */
.diff-para-ident   { }                    /* unchanged complete paragraphs */
.diff-span-ident   { }                    /* unchanged words in changed paragraphs */
.diff-para-changed { }                    /* changed paragraphs */
.diff-para-added   { background: green }  /* added complete paragraphs */
.diff-span-added   { background: green }  /* added words in changed paragraphs */
.diff-para-deleted { background: red }    /* deleted complete paragraphs */
.diff-span-deleted { background: red }    /* deleted words in changed paragraphs */

See also the /GlobalCSS patch.


I hope we see this in the NextRelease --SimonDavis

CliffordAdams sent me an email and said that he'd like to include this code in the NextRelease, provided it's not "too long." I don't know if this patch qualifies under those circumstances, but I do hope so. --MichaelBuschbeck

I tried to make this patch against a clean Usemod 0.92 but I get a Perl complaint about the use of a slash in " pack 'L/a*' " in Diff.pm, various places. I run perl 5.005_03, is a newer version of perl required for this patch to work? DirkJanssen

The statement "pack 'L/a*'" creates a string prefixed with a length field; I don't recall which Perl version introduced that feature. The following statements are equivalent (and the second one should work with older Perl versions too): --MichaelBuschbeck

  pack 'L/a*', $t
  pack 'La*', length $t, $t


With Perl 5.005_02 I have also another problem. Perl says: "syntax error at Diff.pm line 228, near "+="
This is
  map $countLines += ($refCountNewlines?->[$_] or 1),
    $indexPara .. $indexPara + $countPara - 1;

Ok, my Perl knowledge is "Perl for stupids". Is it the right way to say

  $countLines += (($refCountNewlines?->[$_]) or (1)), $indexPara .. $indexPara + $countPara - 1;

?? --MartinEbert? mx300@gmx.net 2003-09-14


UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited September 29, 2007 4:47 pm by MarkusLude (diff)
Search: