[wp-trac] [WordPress Trac] #7254: WP Diff shouldn't split words in
the middle of UTF-8 characters
WordPress Trac
wp-trac at lists.automattic.com
Sun Jul 6 20:57:46 GMT 2008
#7254: WP Diff shouldn't split words in the middle of UTF-8 characters
------------------------+---------------------------------------------------
Reporter: nbachiyski | Owner: anonymous
Type: defect | Status: new
Priority: high | Milestone: 2.6
Component: General | Version:
Severity: normal | Keywords: has-patch
------------------------+---------------------------------------------------
Expected:
When we compare Грещките and Грешките we should get the following HTML
code for the deleted part:
{{{
Гре<del>щ</del>ките
}}}
However, we get:
{{{
Гре�<del>�</del>ките
}}}
{{{WP_Text_Diff_Renderer_inline::_splitOnWords()}}} uses the following
regular expression to split words: {{{/([^\w])/}}}. {{{\w}}} in this case
matches {{{[a-zA-Z0-9_]}}} and everything else is outside of a word. This
both isn't a good definition of a word and allows a word to end in the
middle of a UTF-8 character, which is the case above.
The solution is to make the regular expression work on a UTF-8 string,
using the /u modifier (available from PHP 4.1.0).
Patch attached.
--
Ticket URL: <http://trac.wordpress.org/ticket/7254>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list