[wp-hackers] Need internationalization-issue help, UTF-8 RSSfeeds...

David Chait davebytes at comcast.net
Tue Aug 10 15:14:05 UTC 2004


Well, setting the parser to UTF-8 was definitely key, but not everything.
So, figuring other people might run into a similar situation, here's what I
found.

I have a half dozen routines on run on RSS feed titles, etc., to 'clean'
them up for output -- we can debate whether I should be cleaning them at all
at a later time. ;)

Turns out that htmlentities garbles multibyte/unicode characters, but just
using htmlspecialchars only touches a select set of characters for encoding.
Once I made that switch, CG-FeedRead is now reading Danish/Nordic feeds like
a pro!

Just thought I'd pass along the knowledge to the WP dev community.

-d

----- Original Message ----- 
From: "David Chait" <davebytes at comcast.net>
Sent: Monday, August 09, 2004 9:27 PM
> DUUUUUUH.
>
> Thanks Ryan... apparently I'm just WAAAY overworked (and unemployed) and
my
> brain is really starting to go on me...  I'll look at the xml_parser
options
> further, test out forcing UTF-8 mode.
>
> -d
>
> From: "Ryan Boren" <ryan at boren.nu>
> Sent: Monday, August 09, 2004 2:39 PM
> >
> > > So... How programmatically do I keep from stomping the UTF-8 chars?
> Even
> > > when debugging through the feed processing, it looks like it is too
late
> and
> > > the UTF-8 to ascii (or something) 'stomp' has already occurred.  I
> wouldn't
> > > be surprised to find that it is in fact certain PHP XML library calls
> that I
> > > am making to convert the XML into structured arrays that is part of
the
> > > problem (and would, painfully, write my own XML converter if need be).
> I do
> > > understand that some of the other string functions I am using will
> > > completely bork UTF-8 strings at this point (trim/strip/substring
> functions,
> > > for example).
> >
> > The expat parser allows setting the source and target encodings to UTF-
> > 8.  I believe it uses UTF-8 for its internal representation.  Do you use
> > expat?  Does xml_parser_create("UTF-8") not help?
> >
> > Ryan
> >
>





More information about the hackers mailing list