[wp-trac] [WordPress Trac] #22363: Accents in attachment filenames should be sanitized

WordPress Trac noreply at wordpress.org
Mon Nov 18 20:08:38 UTC 2013


#22363: Accents in attachment filenames should be sanitized
----------------------------------------+------------------
 Reporter:  tar.gz                      |       Owner:
     Type:  defect (bug)                |      Status:  new
 Priority:  normal                      |   Milestone:  3.8
Component:  Upload                      |     Version:  3.4
 Severity:  normal                      |  Resolution:
 Keywords:  has-patch needs-unit-tests  |
----------------------------------------+------------------

Comment (by p_enrique):

 My recent patch uses `wp_strip_all_tags`, adds filter `remove_accents` and
 relies on `preg_replace` (with possible UTF-8 modifier) to remove other
 characters with support for the `sanitize_file_name_chars` filter. Also
 converting the filename to lower case if `mb_string` extension is
 available.

 Here are some test results with my recent patch:
 {{{
 sanitize_file_name( "Posyłają\tSzczegóły\r\n(Październik).jpg" );
  --> string(34) "posylaja-szczegoly-pazdziernik.jpg"

 sanitize_file_name( "<strong>Mọi người đều có quyền <foo> tự do tham
 gia vào đời sống văn hoá của cộng đồn</strong>" );
 string(97) "mọi-nguòi-dèu-có-quyèn-tụ-do-tham-gia-vào-dòi-sóng-
 van-hoá-của-cọng-dòn"

 sanitize_file_name( "-ÉŁè Šíæ 20 % —40° “über/Þøß” βαΣιΛειος à © Ñœijç.õs¿
 & Kö.yr.ä = 10 €-.gif.gif" );
 string(75) "ele-siae-20-40-uber-thos-βασιλειος-a-noeijc.os-
 ko.yr_.a-10.gif.gif"

 sanitize_file_name( "---Все люди 1) рождаются свободными и 2) равными в
 своем 2A) достоинстве и 2B) ПРАВАХ!!!---.jpg" );
 string(140) "все-люди-1-рождаются-свободными-и-2-равными-в-своем-2a-
 достоинстве-и-2b-правах.jpg"

 sanitize_file_name( "ณ ยามที่โลกต้องการเอ่ยถ้อยคำใดๆ โลกจะใช้เพียง Unicode
 เราจึงขอเชิญชวนท่านรีบลงทะเบียนงาน .jpg" );
 string(246) "ณ-ยามที่โลกต้องการเอ่ยถ้อยคำใดๆ-โลกจะใช้เพียง-unicode-
 เราจึงขอเชิญชวนท่านรีบลงทะเบียนงาน.jpg"

 sanitize_file_name( "Invalid Unicode string: abc-\xff-def"
 string(0) ""
 }}}

 (There seem to be some Vietnamese characters not captured by
 `remove_accents`.)

--
Ticket URL: <http://core.trac.wordpress.org/ticket/22363#comment:38>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list