[wp-hackers] 8bit ASCII characters in mail-headers
Andrew Shearer
ashearerw at shearersoftware.com
Thu Sep 9 03:57:19 UTC 2004
On Sep 8, 2004, at 7:18 PM, Sebastian Herp wrote:
> Agreed. The best solution might be a wrapper for the mail-function.
> That way changes would be easier to make if we have failed on our
> mission :-)
>
>> Further, we need to audit all places where we are not UTF-8 friendly.
>> htmlentities(), for example, stomps all over UTF-8 because it defaults
>> to ISO-8859-1.
>>
> That might be a bigger problem: http://de2.php.net/htmlentities (the
> comments there always use their own conversion lists for different
> charsets). The only thing that could be useful here is utf8_decode(),
> but this function only exists for utf8, and whoever uses a different
> charset is doomed :-(
Why do we even need htmlentities()? The database fields and post args
are already in UTF-8 (or hopefully another superset of 7-bit ASCII),
and htmlspecialchars() is all we need to convert that into well-formed
HTML. High-ASCII and UTF-8 multibyte characters will just pass through
it unmolested. We've told the browser which encoding to expect, so
entities aren't needed to represent those characters.
As for specifying the encoding to the browser, I've found the following
statement (near the top of my template) works well:
ini_set('default_charset', get_settings('blog_charset'));
This tells PHP to send the right Content-Type header whenever output is
started later on, so it won't break redirects like an unconditional
header('Content-Type: text/html;charset=UTF-8') at the top of the file
would, and debugging output won't interfere with it, which could happen
to the same header() statement if moved lower in the file. (To apply
this to both regular and admin pages, the line could go into
wp-header.php and wp-admin.php.)
Just having the charset specified by <meta http-equiv> isn't enough. I
did a lot of testing with different browsers, and many don't respect
that meta tag when posting forms. They require the form's source page
to specify the desired charset in the HTTP headers themselves.
> Frankly, to me it is only important that my german blog works. But
> fixing it so that it works for everyone can't be too hard either :-)
> We'll see what i can contribute to this "project" ...
The quoted-printable mail subject encoding would help even my site, all
in English. Currently, email subjects have a burst of line noise where
there's supposed to be a curly UTF-8 apostrophe in the blog title.
Andrew
More information about the hackers
mailing list