[wp-hackers] Plugin: Sanitize i18n (UTF-8) titles

Ryan Boren ryan at boren.nu
Tue Sep 7 04:02:06 UTC 2004


In case anyone is interested.  You need the latest nightly for it to
work.

Description: Escapes UTF-8 post, category, and author titles so that
they are suitable for use in URIs.  ASCII characters are preserved as-
is, while other characters are encoded as a sequence of octets
represented in the form %HH, where HH is the hexadecimal representation
of the octet.

For example:

جحخدذ

becomes:

%d8%ac%d8%ad%d8%ae%d8%af%d8%b0

That breaks down to:

U+062C ARABIC LETTER JEEM
U+062D ARABIC LETTER HAH
U+062E ARABIC LETTER KHAH
U+062F ARABIC LETTER DAL
U+0630 ARABIC LETTER THAL

Notice that it breaks down right to left.

Here it is in permalink context:

/archives/2004/09/02/%d8%ac%d8%ad%d8%ae%d8%af%d8%b0/

Here's Greek:

µΩπ
%c2%b5%cf%89%cf%80

Which is left to right.

Ryan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanitize_i18n_titles.php
Type: application/x-php
Size: 1767 bytes
Desc: not available
Url : /pipermail/hackers_wordpress.org/attachments/20040906/c9fc7547/sanitize_i18n_titles.bin


More information about the hackers mailing list