[wp-hackers] Plugin: Sanitize i18n (UTF-8) titles

Matt Mullenweg m at mullenweg.com
Wed Sep 8 01:23:19 UTC 2004


Ryan Boren wrote:

> Hmmm, if we incorporate this into the core, what should the WordPress
> default be?   Escape accented Latin characters, or remove the accents?
> If we remove the accents, we lose information.  Most browsers display
> the escaped characters properly when you mouse over a link, so you are
> insulated a bit from the %HH ugliness.

I think for latin characters we should transliterate them. Remember part 
of the reason we sanitize the way we do is because the URIs are part of 
the UI, hence using hypens instead of underscores and making everything 
lowercase. Percent escaped unicode in the address is really most 
applicable to users of non-latin alphabets.

I also think in the post slug form it should translate back from percent 
escaped to allow for easy editing either way.

-- 
Matt Mullenweg
  http://photomatt.net | http://wordpress.org
http://pingomatic.com | more soon...




More information about the hackers mailing list