[wp-hackers] 8bit ASCII characters in mail-headers

Andrew Shearer ashearerw at shearersoftware.com
Thu Sep 9 03:57:19 UTC 2004


On Sep 8, 2004, at 7:18 PM, Sebastian Herp wrote:
> Agreed. The best solution might be a wrapper for the mail-function. 
> That way changes would be easier to make if we have failed on our 
> mission :-)
>
>> Further, we need to audit all places where we are not UTF-8 friendly.
>> htmlentities(), for example, stomps all over UTF-8 because it defaults
>> to ISO-8859-1.
>>
> That might be a bigger problem: http://de2.php.net/htmlentities (the 
> comments there always use their own conversion lists for different 
> charsets). The only thing that could be useful here is utf8_decode(), 
> but this function only exists for utf8, and whoever uses a different 
> charset is doomed :-(

Why do we even need htmlentities()? The database fields and post args 
are already in UTF-8 (or hopefully another superset of 7-bit ASCII), 
and htmlspecialchars() is all we need to convert that into well-formed 
HTML. High-ASCII and UTF-8 multibyte characters will just pass through 
it unmolested. We've told the browser which encoding to expect, so 
entities aren't needed to represent those characters.

As for specifying the encoding to the browser, I've found the following 
statement (near the top of my template) works well:
ini_set('default_charset', get_settings('blog_charset'));

This tells PHP to send the right Content-Type header whenever output is 
started later on, so it won't break redirects like an unconditional 
header('Content-Type: text/html;charset=UTF-8') at the top of the file 
would, and debugging output won't interfere with it, which could happen 
to the same header() statement if moved lower in the file. (To apply 
this to both regular and admin pages, the line could go into 
wp-header.php and wp-admin.php.)

Just having the charset specified by <meta http-equiv> isn't enough. I 
did a lot of testing with different browsers, and many don't respect 
that meta tag when posting forms. They require the form's source page 
to specify the desired charset in the HTTP headers themselves.

> Frankly, to me it is only important that my german blog works. But 
> fixing it so that it works for everyone can't be too hard either :-) 
> We'll see what i can contribute to this "project" ...

The quoted-printable mail subject encoding would help even my site, all 
in English. Currently, email subjects have a burst of line noise where 
there's supposed to be a curly UTF-8 apostrophe in the blog title.

Andrew




More information about the hackers mailing list