[wp-hackers] 8bit ASCII characters in mail-headers

Sebastian Herp newsletter at scytheman.net
Wed Sep 8 23:18:01 UTC 2004


Affirmative! :-)

Ryan Boren wrote:

> We have a number of mail encoding bugs:
>
>http://mosquito.wordpress.org/bug_view_page.php?bug_id=0000209
>http://mosquito.wordpress.org/bug_view_page.php?bug_id=0000263
>http://mosquito.wordpress.org/bug_view_page.php?bug_id=0000186
>
>  
>
I am now monitoring these bugs.

>Please add your comments and patches to those bugs.  Also, provide test
>cases and examples demonstrating UTF-8 friendliness.  Use characters
>beyond the C3 - C6 blocks.  Your example demonstrates the Greek small
>letter Pi, 0xCF 0x80, which is good to see.
>
>  
>
I tested it with several chars from your earlier posting about the 
"sanitize titles" thing. As far as i can tell there should be now 
problem with utf-8 chars (not even with 3 and 4 byte long chars). As i 
wrote earlier it's just converting the headers to quoted printable (like 
the patch in bug 209, only with some extras the author didn't consider).

>Also, we need to look at the work done by WordPress Japan on WordPress
>ME.
>
>http://wordpress.xwd.jp/
>http://wordpress.xwd.jp/dl/
>
>There's a Changelog explaining some of the places where they use
>mb_send_mail().
>
>http://cvs.sourceforge.jp/cgi-
>bin/viewcvs.cgi/wordpress/wordpress/change_log.txt?rev=1.2&content-
>type=text/vnd.viewcvs-markup
>
>  
>
Is mb_send_mail() available on a standard php installation?

>We need to pull all of this together and make a robust mail patch.
>  
>
Agreed. The best solution might be a wrapper for the mail-function. That 
way changes would be easier to make if we have failed on our mission :-)

>Further, we need to audit all places where we are not UTF-8 friendly.
>htmlentities(), for example, stomps all over UTF-8 because it defaults
>to ISO-8859-1.
>  
>
That might be a bigger problem: http://de2.php.net/htmlentities (the 
comments there always use their own conversion lists for different 
charsets). The only thing that could be useful here is utf8_decode(), 
but this function only exists for utf8, and whoever uses a different 
charset is doomed :-(

>I'll be looking into the calendar start-of-week soon.  There are some UI
>decisions that need to be made if we make this provisionable, which
>always slows things down.
>
>  
>
Thank you.

>i18n and l10n are very important to us.  That's why we've put so many
>hours of work into it.
>
>Ryan
>
Frankly, to me it is only important that my german blog works. But 
fixing it so that it works for everyone can't be too hard either :-) 
We'll see what i can contribute to this "project" ...

Sebastian





More information about the hackers mailing list