[wp-trac] [WordPress Trac] #26094: sanitize_file_name() breaks some UTF-8 strings

WordPress Trac noreply at wordpress.org
Sun Nov 17 21:25:14 UTC 2013


#26094: sanitize_file_name() breaks some UTF-8 strings
--------------------------+-----------------------------
 Reporter:  p_enrique     |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Formatting    |    Version:
 Severity:  normal        |   Keywords:
--------------------------+-----------------------------
 I've been testing sanitize_file_name( 'X.jpg' ) where X is an Unicode
 character that is a number or a letter (matching regex `/[\p{L}\p{N}]/u`).
 Alarmingly, there are many rather common characters that will result in a
 malformed, broken string being returned:
 {{{
 (U+00E0) : à Latin small letter a with grave
 (U+0160) : Š Latin capital letter s with caron
 (U+03A0) : Π Greek capital letter pi
 (U+0420) : Р Cyrillic capital letter er
 }}}
 The problem seems to be caused by the `preg_replace` function without a
 Unicode pattern modifier.

--
Ticket URL: <http://core.trac.wordpress.org/ticket/26094>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list