[wp-trac] [WordPress Trac] #6077: UTF-8 strings are sometimes cut
in the middle of a character
WordPress Trac
wp-trac at lists.automattic.com
Mon Mar 3 16:08:42 GMT 2008
#6077: UTF-8 strings are sometimes cut in the middle of a character
------------------------+---------------------------------------------------
Reporter: nbachiyski | Owner: anonymous
Type: defect | Status: new
Priority: normal | Milestone: 2.5
Component: General | Version:
Severity: normal | Keywords: unicode utf-8 excerpt
------------------------+---------------------------------------------------
Using {{{substr}}} on UTF-8 strings can cause some characters to be cut on
the middle, because {{{substr}}} counts bytes, but in UTF-8 a character
can be more than one byte.
Here is a patch, which:
* Defines {{{mb_strcut}}} in {{{compat.php}} for the users, who don't
have the {{{mb_string}}} extension.
* Introduces a new {{{wp_html_excerpt}}} function, which uses
{{{mb_strcut}}} and works well with html strings: counts entities as one
character (& isn't 4 chars) and strips tags.
There are some tests for the two functions:
* [http://svn.automattic.com/wordpress-tests/wp-
testcase/test_includes_compat.php _mb_strcut]
* [http://svn.automattic.com/wordpress-tests/wp-
testcase/test_includes_formatting.php wp_html_excerpt] (in the end of the
file)
--
Ticket URL: <http://trac.wordpress.org/ticket/6077>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list