[wp-trac] [WordPress Trac] #11738: sanitize_text_field() issue with UTF-8 characters

WordPress Trac wp-trac at lists.automattic.com
Wed Jan 6 12:59:10 UTC 2010


#11738: sanitize_text_field() issue with UTF-8 characters
--------------------------+-------------------------------------------------
 Reporter:  hakre         |       Owner:             
     Type:  defect (bug)  |      Status:  new        
 Priority:  normal        |   Milestone:  2.9.2      
Component:  General       |     Version:  2.9.1      
 Severity:  normal        |    Keywords:  needs-patch
--------------------------+-------------------------------------------------

Comment(by hakre):

 Replying to [comment:1 azaozz]:
 > PCRE UTF-8 (the "u" modifier) is not supported everywhere. For reference
 see  wp_check_invalid_utf8().
 > We probably could set a global instead of the static there and use that
 for the three functions since they are usually called multiple times.

 ''wp_check_invalid_utf8()'' has it's own problems, I know. Please see the
 PHP documentation I pointed to in this tickets description and Denis did
 in a comment right in this ticket regarding a clear statement since which
 version the u-modifier is actually available. It is matching with
 WordPress current system requirements, so that function can benefit from a
 refactoring anyway.

 Setting a static and/or global does not help since on each function call
 the input might have a different encoding. We have functions that are
 working independently from php extenstions like ''seems_utf8()'' for
 example. In another patch I offer a fallback save implementation as
 ''is_valid_utf8()'' that does the job in any case even if the preg
 functions do not support any u-modifier. Something the current code is
 currently missing. Please
 [http://core.trac.wordpress.org/attachment/ticket/5998/5998.2.patch#L275
 see that code, look for is_valid_utf8()]. You can find additional
 documentation on [http://codex.wordpress.org/User:Hakre/UTF8 my codex page
 regarding utf8 and php].

 >
 > In any case this will need testing in an affected locale/installation.
 Afaik the current implementation fails with shift-spaces. In the other
 ticket there is the test-case this function needs to cope with, those
 russian letters in UTF8. Prior to commit of the last patch that was the
 only thing "tested" against. No further review of the patch nor further
 tests.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/11738#comment:5>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list