[wp-hackers] 8bit ASCII characters in mail-headers

Ryan Boren ryan at boren.nu
Wed Sep 8 21:26:47 UTC 2004


On Wed, 2004-09-08 at 16:18 +0200, Sebastian Herp wrote:
> drDave wrote:
> 
> >
> > On Sep 8, 2004, at 9:25 PM, Sebastian Herp wrote:
> >
> >> Hello again,
> >>
> >> i played around with the comment notification function and as drDave 
> >> suspected, wordpress doesn't respect RFCs :-)
> >
> >
> > damn. I didn't do anything... not me sir...
> 
> I am sorry, i am not a native english speaker :-) "suspected" was the 
> wrong word ... you "asked" if wordpress respects the RFCs :-)
> 
> >> So I wrote a little function that converts Subject and From headers 
> >> correctly into the "Q" encoding described in RFC1522 
> >> (http://rfc.net/rfc1522.html). I have tested it with Wordpress 1.2 
> >> and it works perfectly :-) Every ÖÄÜß is encoded ... hurray!!!! Is 
> >> there a large enough interest from the developer-side to implement 
> >> this in wordpress
> >
> >
> > I say there definitely should be one...
> > however, one important question: you are encoding these headers using 
> > UTF-8, right?
> > If we want WP to be truly usable by non-English speaker, supporting 
> > non ASCII titles is vital, even more so for non Latin character sets 
> > (kanji, arab, hebrew etc) where the whole sequence would be absolutely 
> > unreadable. So such a function must make sure it supports all the 
> > encoding it could get fed, convert to UTF-8 (if mb_string is 
> > available) and then encode according to RFC1522.
> >
> Encoding is a strong word ... i am not really encoding anything and if 
> you look at the "Q" encoding thing, we don't have to :-) I am only 
> converting all chars which have the 8th bit set into hexcode (e.g. 
> "=F6") and say that it is whatever encoding the wordpress admin has set 
> in his options. That should work for everyone ...
> 
> Example:
> "Ich liebe Umlaute: äöüß~whatever trala!πक" becomes
> =?UTF-8?Q?Ich liebe Umlaute: =C3=A4=C3=B6=C3=BC=C3=9F~whatever trala!=CF?=
> =?UTF-8?Q?=80=E0=A4=95?=
> 
> >> (had a bad expierience with the wp-calendar which _still_ does not 
> >> know that there are nations not using sunday as the first day of the 
> >> week)? If yes, i'll get it to work with the CVS version and post the 
> >> diffs ...
> >
> >
> > that sounds to me like something which should be implemented. there 
> > are definitely a lot of countries (actually, pretty much anywhere else 
> > beside the US) that start the week on Monday and I can imagine the 
> > US-style of wp-calendar makes it mostly useless for them.
> >
> > Hope this gets implemented in the core one way or another (support for 
> > UTF-8 needs to be more consistent, imho)...
> 
> Calendar: yes
> UTF-8: this header thing has nothing to do with UTF-8 ... i doesn't 
> matter what charset is used only the fact that it IS "encoded" ...

We have a number of mail encoding bugs:

http://mosquito.wordpress.org/bug_view_page.php?bug_id=0000209
http://mosquito.wordpress.org/bug_view_page.php?bug_id=0000263
http://mosquito.wordpress.org/bug_view_page.php?bug_id=0000186

Please add your comments and patches to those bugs.  Also, provide test
cases and examples demonstrating UTF-8 friendliness.  Use characters
beyond the C3 - C6 blocks.  Your example demonstrates the Greek small
letter Pi, 0xCF 0x80, which is good to see.

Also, we need to look at the work done by WordPress Japan on WordPress
ME.

http://wordpress.xwd.jp/
http://wordpress.xwd.jp/dl/

There's a Changelog explaining some of the places where they use
mb_send_mail().

http://cvs.sourceforge.jp/cgi-
bin/viewcvs.cgi/wordpress/wordpress/change_log.txt?rev=1.2&content-
type=text/vnd.viewcvs-markup

We need to pull all of this together and make a robust mail patch.

Further, we need to audit all places where we are not UTF-8 friendly.
htmlentities(), for example, stomps all over UTF-8 because it defaults
to ISO-8859-1.

I'll be looking into the calendar start-of-week soon.  There are some UI
decisions that need to be made if we make this provisionable, which
always slows things down.

i18n and l10n are very important to us.  That's why we've put so many
hours of work into it.

Ryan





More information about the hackers mailing list