[wp-hackers] 8bit ASCII characters in mail-headers

John Hewitt johnblade at gmail.com
Fri Sep 10 06:38:43 UTC 2004


My wordpress wp-mail hack uses UTF-8 to decode the body as there have
been quite a few issues with posting.
It uses html_entities to decode the subject line.

I would love to have my wordpress hack in the core after a bit more bug-testing.

I agree with the html_entities not being necessary...  It all depends
though what the user wants to have for the end result.
Do they want to send in HTML format?
Do they want to send in enriched text format?
Do they want to send in ubb text format????????? etc etc etc

Anyway, my hack is @ http://blade.lansmash.com/index.php?p=93

On Wed, 8 Sep 2004 23:57:19 -0400, Andrew Shearer
<ashearerw at shearersoftware.com> wrote:
> On Sep 8, 2004, at 7:18 PM, Sebastian Herp wrote:
> > Agreed. The best solution might be a wrapper for the mail-function.
> > That way changes would be easier to make if we have failed on our
> > mission :-)
> >
> >> Further, we need to audit all places where we are not UTF-8 friendly.
> >> htmlentities(), for example, stomps all over UTF-8 because it defaults
> >> to ISO-8859-1.
> >>
> > That might be a bigger problem: http://de2.php.net/htmlentities (the
> > comments there always use their own conversion lists for different
> > charsets). The only thing that could be useful here is utf8_decode(),
> > but this function only exists for utf8, and whoever uses a different
> > charset is doomed :-(
> 
> Why do we even need htmlentities()? The database fields and post args
> are already in UTF-8 (or hopefully another superset of 7-bit ASCII),
> and htmlspecialchars() is all we need to convert that into well-formed
> HTML. High-ASCII and UTF-8 multibyte characters will just pass through
> it unmolested. We've told the browser which encoding to expect, so
> entities aren't needed to represent those characters.
> 
> As for specifying the encoding to the browser, I've found the following
> statement (near the top of my template) works well:
> ini_set('default_charset', get_settings('blog_charset'));
> 
> This tells PHP to send the right Content-Type header whenever output is
> started later on, so it won't break redirects like an unconditional
> header('Content-Type: text/html;charset=UTF-8') at the top of the file
> would, and debugging output won't interfere with it, which could happen
> to the same header() statement if moved lower in the file. (To apply
> this to both regular and admin pages, the line could go into
> wp-header.php and wp-admin.php.)
> 
> Just having the charset specified by <meta http-equiv> isn't enough. I
> did a lot of testing with different browsers, and many don't respect
> that meta tag when posting forms. They require the form's source page
> to specify the desired charset in the HTTP headers themselves.
> 
> > Frankly, to me it is only important that my german blog works. But
> > fixing it so that it works for everyone can't be too hard either :-)
> > We'll see what i can contribute to this "project" ...
> 
> The quoted-printable mail subject encoding would help even my site, all
> in English. Currently, email subjects have a burst of line noise where
> there's supposed to be a curly UTF-8 apostrophe in the blog title.
> 
> Andrew
> 
> 
> 
> 
> _______________________________________________
> hackers mailing list
> hackers at wordpress.org
> http://wordpress.org/mailman/listinfo/hackers_wordpress.org
> 



-- 
Regards,
      JB Hewitt
Business: http://www.stcpl.com.au
Blog: http://blade.lansmash.com



More information about the hackers mailing list