Contact: MeatBall:AlexSchroeder
Which output do you prefer? On a braille terminal, this patched format looks better because there are no colors and no bold available. Note that you have to change a page to switch it from the old (cached) diff to the new diff format.
The improved diff mechanism (version 2) which will take the original diff, take each chunk, put all words on lines of their own, diff again, and use the result to identify the words which have in fact changed. These words will then be highlighted in the original diff by surrounding the changed words with bold tags and an asterisk.
Talking about version 1 of this patch:
I think this new diff is a large improvement over the previous diff as it solves that pesky problem of finding the one word differences. I always wondered what would happen if someone inverted the meaning of a large paragraph by changing a small amount of text (say can't to can). It's hard to detect that. On the other hand, it's more cluttered in its current incarnation and thus more confusing. But suppose we used another channel to demonstrate the difference? Say we turned the changed words red? Of course, that would not be accessible to text-only browsers. We could underline the words too, but that makes them look like hyperlinks. We could make them blink. ;) -- SunirShah
Have you seen wdiff? WikiSuggestions/ContextDiff.
I'm thinking of tackling the formatting problem by using spans: <span class=added> and <span class=removed>. This way just how it is formatted is up to the stylesheet, which on my wiki at least is also in the user's hands. The diffs could even be incorporated into the main page this way: background highly inserted text, and strikethru elided text. Much more readable. -- EricScheid
Yes, style sheets would be good. At the moment my main problem is making the diff accessible to blind users on a braille terminal. -- AlexSchroeder
Hmmm ... do browsers that support CSS also support the new <DEL> and <INS> tags. That would be the more "correct" syntax. MSIE5/mac does. -- EricScheid
I thought about that. But would that not make changes "invisible" when rendered correctly? Plus I need a solution for blind users *now* -- CSS and audio stylesheets (aural?) are cool, but what about lynx and w3m users? I use w3m at home, just to force myself to take these things into account. Oh and btw I installed the new patch on the Emacs Wiki. Let's see how it goes. -- as
As it turns out, the default rendering of DEL and INS by MSIE5.x/mac is to strike-thru the deleted, and underline the inserted. A style sheet could instead use different background colours or whatever (eg. whisper the deleted). -- es
Actually, all graphical browsers I tested (mozilla, firefox, galeon, dillo, konqueror, internet explorer) used strikeout / underline for <del> and <ins> respectively; text browsers (lynx, elinks, w3m) seem to write [:DEL foo ] and [:INS foo ]. -- TrentBuck?
The idea of adding bold tags around changes breaks when changes are large and include paragraph level markup such as horizontal lines. So I'm still looking for a good alternative. -- as
Version 3 of the patch translates [\n\r]+ into \n before diffing. Plus, it handles changes that span several lines.
Version 4 fixes a bug which broke the improved diffs when more than three lines or lines with special wiki prefixes such as "*" or ":" where involved.
cd /home/alex/WWW/emacswiki/cgi-bin/ diff -c /home/alex/src/usemod092/wiki.pl /home/alex/WWW/emacswiki/cgi-bin/wiki-adiff-4 *** /home/alex/src/usemod092/wiki.pl Sun Apr 22 02:44:10 2001 --- /home/alex/WWW/emacswiki/cgi-bin/wiki-adiff-4 Sat Mar 16 23:42:34 2002 *************** *** 1,5 **** --- 1,6 ---- #!/usr/bin/perl # UseModWiki version 0.92 (April 21, 2001) + # Copyright (C) 2001-2002 Alex Schröder (recent visitors, improved diffs) # Copyright (C) 2000-2001 Clifford A. Adams # <caadams@frontiernet.net> or <usemod@usemod.com> # Based on the GPLed AtisWiki 0.3 (C) 1998 Markus Denker *************** *** 1633,1639 **** sub GetDiff { my ($old, $new, $lock) = @_; my ($diff_out, $oldName, $newName); ! &CreateDir($TempDir); $oldName = "$TempDir/old_diff"; $newName = "$TempDir/new_diff"; --- 1634,1641 ---- sub GetDiff { my ($old, $new, $lock) = @_; my ($diff_out, $oldName, $newName); ! $old =~ s/[\r\n]+/\n/g; ! $new =~ s/[\r\n]+/\n/g; &CreateDir($TempDir); $oldName = "$TempDir/old_diff"; $newName = "$TempDir/new_diff"; *************** *** 1645,1655 **** &WriteStringToFile($oldName, $old); &WriteStringToFile($newName, $new); $diff_out = `diff $oldName $newName`; - &ReleaseDiffLock() if ($lock); $diff_out =~ s/\\ No newline.*\n//g; # Get rid of common complaint. # No need to unlink temp files--next diff will just overwrite. return $diff_out; } sub DiffToHTML { my ($html) = @_; --- 1647,1783 ---- &WriteStringToFile($oldName, $old); &WriteStringToFile($newName, $new); $diff_out = `diff $oldName $newName`; $diff_out =~ s/\\ No newline.*\n//g; # Get rid of common complaint. + $diff_out = &improve_diff($diff_out); ### IMPROVE DIFF, before lock is released + &ReleaseDiffLock() if ($lock); # No need to unlink temp files--next diff will just overwrite. return $diff_out; } + + ### start of IMPROVE DIFF + + sub improve_diff_strip_prefix + { + my $str = shift; + $str =~ s/^[<>] //gm; + return $str; + } + + + sub improve_diff_add_prefix + { + my $str =shift; + my $prefix = shift; + my $result = ""; + for my $line (split(/\n/,$str)) + { + $result .= $prefix . $line . "\n"; + } + return $result; + } + + sub improve_diff + { + my $diff = shift; + $diff =~ tr/\r//d; + my @hunks = split (/^(\d+,?\d*[adc]\d+,?\d*\n)/m, $diff); + my $result = shift (@hunks); # intro + while ($#hunks > 0)# at least one header and a real hunk + { + my $header = shift (@hunks); + $result .= $header; + my $chunk = shift (@hunks); + my ($old, $new) = split (/^---\n/m, $chunk, 2); + if ($old and $new) + { + ($old, $new) = improve_diff_add_detail($old, $new); + $result .= $old . "---\n" . $new; + } + else + { + $result .= $chunk; + } + } + $result = &improve_diff_add_html($result); + return $result; + } + + sub improve_diff_add_detail + { + my $old = &improve_diff_strip_prefix(shift); + my $new = &improve_diff_strip_prefix(shift); + my $oldwords = join("\n",split(/\s+/,$old)); + my $newwords = join("\n",split(/\s+/,$new)); + open(A,">$TempDir/a"); + open(B,">$TempDir/b"); + print A $oldwords; + print B $newwords; + close(A); + close(B); + my $diff = `diff $TempDir/a $TempDir/b`; + while ($diff =~ /^(\d+),?(\d*)([adc])(\d+),?(\d*)$/mg) + { + my ($start1,$end1,$type,$start2,$end2) = ($1,$2,$3,$4,$5); + # changes are like additons + deletions + if ($type eq "d" or $type eq "c") + { + $end1 = $start1 unless $end1; + $old = &improve_diff_mark_words($old,$start1,$end1); + } + if ($type eq "a" or $type eq "c") + { + $end2 = $start2 unless $end2; + $new = &improve_diff_mark_words($new,$start2,$end2); + } + } + return (&improve_diff_add_prefix($old, "< "), + &improve_diff_add_prefix($new, "> ")); + } + + sub improve_diff_mark_words + { + # Use $FS2 and $FS3 to mark beginning and ending of changes + my ($str,$start,$end) = @_; + my $first = $start - 1; + my $words = 1 + $end - $start; + $str =~ s|^((\S+\s*){$first})((\S+\s*?){$words})|$1$FS2$3$FS3|; + return $str; + } + + sub improve_diff_add_html + { + # translate $FS2 and $FS3 to HTML, taking newlines into account + my $arg = shift; + my @splits = split(/($FS2|$FS3|\n[<>]?[:* \t]*)/, $arg); + my $state = 0; + my $result = ""; + for my $str (@splits) + { + if ($str eq $FS2) + { + $state = 1; + $result .= "<b>*"; + } + elsif ($str eq $FS3) + { + $state = 0; + $result .= "*</b>"; + } + elsif ($state == 1 and substr($str,0,1) eq "\n") + { + $result .= "</b>" . $str . "<b>"; + } + else + { + $result .= $str; + } + } + return $result; + } + + + + ### end of IMPROVE DIFF sub DiffToHTML { my ($html) = @_; Diff finished at Sat Mar 16 23:42:52
These are no longer of interest, since following versions just improved on them without introducing any new ideas.
Here is the older patch. It splits revisions such that each word is on a line of its own, and then uses a unified 10 line context diff. This gives you every 10 words before and after each change. Each change is labeled as an addition or a deletion. This works fine but breaks wiki markup. Thus it looks ugly when changes affect lists or headings, etc.
cd /home/alex/WWW/emacswiki/cgi-bin/ diff -c /home/alex/src/usemod092/wiki.pl /home/alex/WWW/emacswiki/cgi-bin/wiki-adiff-1.pl *** /home/alex/src/usemod092/wiki.pl Sun Apr 22 02:44:10 2001 --- /home/alex/WWW/emacswiki/cgi-bin/wiki-adiff-1.pl Fri Mar 15 19:32:46 2002 *************** *** 45,51 **** $UrlProtocols $UrlPattern $ImageExtensions $RFCPattern $ISBNPattern $FS $FS1 $FS2 $FS3 $CookieName $SiteBase $StyleSheet $NotFoundPg $FooterNote $EditNote $MaxPost $NewText $NotifyDefault $HttpCharset ! $UserGotoBar); # Note: $NotifyDefault is kept because it was a config variable in 0.90 # Other global variables: use vars qw(%Page %Section %Text %InterSite %SaveUrl %SaveNumUrl --- 45,51 ---- $UrlProtocols $UrlPattern $ImageExtensions $RFCPattern $ISBNPattern $FS $FS1 $FS2 $FS3 $CookieName $SiteBase $StyleSheet $NotFoundPg $FooterNote $EditNote $MaxPost $NewText $NotifyDefault $HttpCharset ! $UserGotoBar $DiffCommand $DiffWords); # Note: $NotifyDefault is kept because it was a config variable in 0.90 # Other global variables: use vars qw(%Page %Section %Text %InterSite %SaveUrl %SaveNumUrl *************** *** 84,89 **** --- 84,91 ---- $NewText = ""; # New page text ("" for default message) $HttpCharset = ""; # Charset for pages, like "iso-8859-2" $UserGotoBar = ""; # HTML added to end of goto bar + $DiffCommand = "diff -u -10"; # What to use for diffing + $DiffWords = 1; # 0 = diff lines, 1 = diff words # Major options: $UseSubpage = 1; # 1 = use subpages, 0 = do not use subpages *************** *** 1642,1650 **** $oldName .= "_locked"; $newName .= "_locked"; } &WriteStringToFile($oldName, $old); &WriteStringToFile($newName, $new); ! $diff_out = `diff $oldName $newName`; &ReleaseDiffLock() if ($lock); $diff_out =~ s/\\ No newline.*\n//g; # Get rid of common complaint. # No need to unlink temp files--next diff will just overwrite. --- 1644,1656 ---- $oldName .= "_locked"; $newName .= "_locked"; } + if ($DiffWords == 1) { + $old =~ s/\s+/\n/g; + $new =~ s/\s+/\n/g; + } &WriteStringToFile($oldName, $old); &WriteStringToFile($newName, $new); ! $diff_out = `$DiffCommand $oldName $newName`; &ReleaseDiffLock() if ($lock); $diff_out =~ s/\\ No newline.*\n//g; # Get rid of common complaint. # No need to unlink temp files--next diff will just overwrite. *************** *** 1660,1677 **** $tAdded = T('Added:'); $html =~ s/\n--+//g; # Note: Need spaces before <br> to be different from diff section. $html =~ s/(^|\n)(\d+.*c.*)/$1 <br><strong>$tChanged $2<\/strong><br>/g; $html =~ s/(^|\n)(\d+.*d.*)/$1 <br><strong>$tRemoved $2<\/strong><br>/g; $html =~ s/(^|\n)(\d+.*a.*)/$1 <br><strong>$tAdded $2<\/strong><br>/g; $html =~ s/\n((<.*\n)+)/&ColorDiff($1,"ffffaf")/ge; $html =~ s/\n((>.*\n)+)/&ColorDiff($1,"cfffcf")/ge; return $html; } sub ColorDiff { my ($diff, $color) = @_; ! $diff =~ s/(^|\n)[<>]/$1/g; $diff = &QuoteHtml($diff); # Do some of the Wiki markup rules: %SaveUrl = (); --- 1666,1691 ---- $tAdded = T('Added:'); $html =~ s/\n--+//g; # Note: Need spaces before <br> to be different from diff section. + # plain diff $html =~ s/(^|\n)(\d+.*c.*)/$1 <br><strong>$tChanged $2<\/strong><br>/g; $html =~ s/(^|\n)(\d+.*d.*)/$1 <br><strong>$tRemoved $2<\/strong><br>/g; $html =~ s/(^|\n)(\d+.*a.*)/$1 <br><strong>$tAdded $2<\/strong><br>/g; + # unified diff + $html =~ s/(^|\n)(---|\+\+\+) .*//g; + $html =~ s/(^|\n)@@ (.*) @@/$1 <br><strong>$tChanged $2<\/strong><br>/g; + # plain diff $html =~ s/\n((<.*\n)+)/&ColorDiff($1,"ffffaf")/ge; $html =~ s/\n((>.*\n)+)/&ColorDiff($1,"cfffcf")/ge; + # unified diff + $html =~ s/\n((-.*\n)+)/&ColorDiff("$tRemoved\n$1", "ffffaf")/ge; + $html =~ s/\n((\+.*\n)+)/&ColorDiff("$tAdded\n$1", "cfffcf")/ge; return $html; } sub ColorDiff { my ($diff, $color) = @_; ! $diff =~ s/(^|\n)[<>+-]/$1/g; $diff = &QuoteHtml($diff); # Do some of the Wiki markup rules: %SaveUrl = (); *************** *** 1682,1688 **** $diff = &CommonMarkup($diff, 0, 1); # No images, all patterns $diff =~ s/$FS(\d+)$FS/$SaveUrl{$1}/ge; # Restore saved text $diff =~ s/$FS(\d+)$FS/$SaveUrl{$1}/ge; # Restore nested saved text ! $diff =~ s/\r?\n/<br>/g; return "<table width=\"95\%\" bgcolor=#$color><tr><td>\n" . $diff . "</td></tr></table>\n"; } --- 1696,1704 ---- $diff = &CommonMarkup($diff, 0, 1); # No images, all patterns $diff =~ s/$FS(\d+)$FS/$SaveUrl{$1}/ge; # Restore saved text $diff =~ s/$FS(\d+)$FS/$SaveUrl{$1}/ge; # Restore nested saved text ! if ($DiffWords == 0) { ! $diff =~ s/\r?\n/<br>/g; ! } return "<table width=\"95\%\" bgcolor=#$color><tr><td>\n" . $diff . "</td></tr></table>\n"; } Diff finished at Fri Mar 15 19:33:33