[wp-trac] [WordPress Trac] #47763: Uploaded files that meet certain conditions do not hit in media search
WordPress Trac
noreply at wordpress.org
Tue Jul 23 08:00:56 UTC 2019
#47763: Uploaded files that meet certain conditions do not hit in media search
----------------------------+-----------------------------
Reporter: dxd5001 | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Media | Version: 5.2.2
Severity: normal | Keywords:
Focuses: administration |
----------------------------+-----------------------------
I upload a media file, but it does not appear on the media page when
searching the file by the filename.
In my observation, it only happens when this condition has gathered:
1. The file created with a macOS X
2. The file named in Japanese like “ワードプレス.pdf”
3. The name included a sonant mark and/or P‐sound consonant mark
4. Uploaded from a web browser except for Safari
I think this is caused by Unicode Normalization and APFS and/or HFS+ (both
are a file system of macOS X).
The file system uses Normalization Form D (decomposition) for naming, but
when I type the name in search window from a browser except Safari, it
behaves as Normalization Form C (composition), so these characters don't
match.
プ - The character with P‐sound consonant mark added in a filename
{{{
Unicode: U+30D5 U+309A, UTF-8: E3 83 95 E3 82 9A
}}}
プ - The character with P‐sound consonant mark typed in the search window
from a browser (Chrome)
{{{
KATAKANA LETTER PU
Unicode: U+30D7, UTF-8: E3 83 97
}}}
These characters look the same but not the same.
You can check easily by copy & paste above characters to macOS character
viewer.
Right-click the character and copy to get detail information.
Fortunately, there is a normalizer class in PHP
(https://www.php.net/manual/en/class.normalizer.php).
So I tried using this class in the function wp_unique_filename(wp-
includes/functions.php) and the results are good.
I added this code in wp-includes/functions.php line 2257:
{{{#!php
<?php
// Unicode Normalization: Normalize Form D (decomposition) to Form
C (composition).
if ( Normalizer::isNormalized( $filename, Normalizer::FORM_D ) ) {
$filename = Normalizer::normalize( $filename,
Normalizer::FORM_C );
}
}}}
The file appears in search results on the media page. And also a page that
file attached to the content area will hit by text search from the front-
end search box.
Although we can deal with this problem using “wp_unique_filename” filter
and above class, I think it’s better to handle it in the core file.
—
Test Environment:
WordPress 5.2.2
PHP 7.2.17
MySQL 5.7.16
macOS X 10.14.5 (MacBook Air)
File system: Apple File System (APFS)
Chrome 75.0.3770.142
Safari 12.1.1 (14607.2.6.1.1)
Firefox 68.0.1
--
Ticket URL: <https://core.trac.wordpress.org/ticket/47763>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list