[Home]WikiBugs/Utf8Editing

UseModWiki | WikiBugs | RecentChanges | Preferences

Fixed in 1.0, provided $NewFS is set to true.
UseMod doesn't edit UTF8 characters very well.

- CrisFitch

I don't think I can do anything about this. (I don't know very much about UTF8.) In this case it appears that your browser is nicely editing UTF8 text, but then sending it to UseMod as &#1204 chars. (UseMod itself never translates characters to &# characters.) I would be interested in any suggestions or workarounds. --CliffordAdams

If the page is not served marked as UTF-8, that's typical behavior of current Mozilla and Internet Explorer. The submitted characters can't be encoded in the present document's encoding, so the browser tries to do something more useful than dumping question marks.

If you set the $HttpCharset to "utf-8", this shouldn't happen. --BrionVibber

Funny, Brion, that you happened to come here the same day as myself :) But, anyway, what Brion said is correct: if the page is not served as UTF8 then sending utf8 input is illegal. Mozilla (illegally!) tries to convert it to embeds (&#), opera marks illegal chars with question marks.

On the other hand something is still buggy. If I set usemod to utf8 I can almost use utf-8: I cannot write LATIN SMALL LETTER O WITH ACUTE (UTF-8: 0xC3 0xB3), it gets replaced to an illegal character. Probably other utf8 chars get garbled as well.

See garbled: http://narya.grin.hu/cgi-bin/wiki.pl?TesztLap (it wasn't preformatted, so the lines are not nice, but the point is the illegal utf8 characters, and not the beauty of the page) and the original at http://hu.wikipedia.org/w/wiki.phtml?title=Wikip%C3%A9dia:UTF-8_demo

This is surely not a browser bug but a usemod problem, as Wikipedia obviously handles it very well. -- PeterGervai?

With the default UseModWiki settings you cannot use the 0xB3 byte anywhere in the wiki text (it is used internally as a file separator). However, as of 1.0, there is a $NewFS option which allows 0xB3 bytes (it uses a long separator which is not a legal sequence in UTF-8). (This new setting requires old databases to be converted, so it is not the default.) --CliffordAdam?

Yep, this is from Debian's package (v0.92), not the newest. Probably the change you described solve all these problems. -- PeterGervai?

This is just to confirm that 1.00 handles unicode correctly. Hopefully it'll upgraded in Debian soon. -- PeterGervai?

-------- Original Message --------
Subject: Re: Wiki UTF-8 support
Date: Thu, 2 Oct 2003 12:16:40 -0700
From: "Phlip" <plumlee@systransoft.com>
To: "Cris Fitch" <fitch@systransoft.com>

I found this in the headers:

IHTMLFormElement::Putencoding

It implies there is a 
attribute.

Maybe UseMod simply misses this?

(That would solve the problem - how to transmit UTF-8 in raw HTML without
high bits...)


----- Original Message -----
From: "Cris Fitch"
To: "Phlip"
Sent: Thursday, October 02, 2003 11:41 AM
Subject: Re: Wiki UTF-8 support


! Ack. Our current version of Wiki is better behaved than this, at least
for
! the characters that work. what to do, what to do. Any other ideas,
! assuming we don't want to become Wiki code maintainers?
!
! - Cris
!
! Phlip wrote:
!
! ! It looks like our Wiki is an older fork of a very successful and
! !widely supported
! !wiki called "UseMod". When I tried the Farsi sample with a recent
! !UseMod installation
! !it did not show the UTF8 bug.
! !
! ! http://redsquirrel.com/cgi-bin/facman?Utf8Test


UseModWiki | WikiBugs | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited October 14, 2007 10:55 pm by MarkusLude (diff)
Search: