[wp-trac] [WordPress Trac] #47763: Uploaded files that meet certain conditions do not hit in media search

WordPress Trac noreply at wordpress.org
Tue Jul 23 08:00:56 UTC 2019


#47763: Uploaded files that meet certain conditions do not hit in media search
----------------------------+-----------------------------
 Reporter:  dxd5001         |      Owner:  (none)
     Type:  defect (bug)    |     Status:  new
 Priority:  normal          |  Milestone:  Awaiting Review
Component:  Media           |    Version:  5.2.2
 Severity:  normal          |   Keywords:
  Focuses:  administration  |
----------------------------+-----------------------------
 I upload a media file, but it does not appear on the media page when
 searching the file by the filename.

 In my observation, it only happens when this condition has gathered:

 1. The file created with a macOS X
 2. The file named in Japanese like “ワードプレス.pdf”
 3. The name included a sonant mark and/or P‐sound consonant mark
 4. Uploaded from a web browser except for Safari

 I think this is caused by Unicode Normalization and APFS and/or HFS+ (both
 are a file system of macOS X).

 The file system uses Normalization Form D (decomposition) for naming, but
 when I type the name in search window from a browser except Safari, it
 behaves as Normalization Form C (composition), so these characters don't
 match.

 プ - The character with P‐sound consonant mark added in a filename

 {{{
 Unicode: U+30D5 U+309A, UTF-8: E3 83 95 E3 82 9A
 }}}

 プ - The character with P‐sound consonant mark typed in the search window
 from a browser (Chrome)

 {{{
 KATAKANA LETTER PU
 Unicode: U+30D7, UTF-8: E3 83 97
 }}}


 These characters look the same but not the same.
 You can check easily by copy & paste above characters to macOS character
 viewer.
 Right-click the character and copy to get detail information.

 Fortunately, there is a normalizer class in PHP
 (https://www.php.net/manual/en/class.normalizer.php).

 So I tried using this class in the function wp_unique_filename(wp-
 includes/functions.php) and the results are good.

 I added this code in wp-includes/functions.php line 2257:
 {{{#!php
 <?php
         // Unicode Normalization: Normalize Form D (decomposition) to Form
 C (composition).
         if ( Normalizer::isNormalized( $filename, Normalizer::FORM_D ) ) {
                 $filename = Normalizer::normalize( $filename,
 Normalizer::FORM_C );
         }
 }}}

 The file appears in search results on the media page. And also a page that
 file attached to the content area will hit by text search from the front-
 end search box.

 Although we can deal with this problem using “wp_unique_filename” filter
 and above class, I think it’s better to handle it in the core file.

 —
 Test Environment:

 WordPress 5.2.2
 PHP 7.2.17
 MySQL 5.7.16
 macOS X 10.14.5 (MacBook Air)
 File system: Apple File System (APFS)
 Chrome 75.0.3770.142
 Safari 12.1.1 (14607.2.6.1.1)
 Firefox 68.0.1

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/47763>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list