[wp-trac] [WordPress Trac] #22363: Accents in attachment filenames should be sanitized
WordPress Trac
noreply at wordpress.org
Wed Jan 23 09:37:22 UTC 2013
#22363: Accents in attachment filenames should be sanitized
--------------------------+------------------
Reporter: tar.gz | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: 3.6
Component: Upload | Version: 3.4
Severity: normal | Resolution:
Keywords: has-patch |
--------------------------+------------------
Comment (by tar.gz):
Indeed, the results are different in Linux, Windows, OSX.
Here is the current status of my testing:
'''Viewing''' of images with accents ("moiré.jpg") is broken in:
* Safari 5.1.7 on OSX (wasn't able to test Safari 6 yet).
* Safari on iOS 5.1.1 (test device: iPod)
* Safari on iOS 6 (test device: iPad)
With the patch you provided, when '''uploading''' a file named
"Forêt.jpg":
* Windows Vista / IE7 : works, saves as "Foret.jpg".
* Ubuntu 11.04 / Firefox 18 : works, saves as "Foret.jpg".
* Mac OSX 10.6 / Firefox 18 : doesn't work, saves as "Forêt.jpg".
Then I tested with some other characters, and found out that it's even
more complicated:
Uploading on Ubuntu/Firefox: ê, ç, ä get converted (to e, c, a) but the
"ö" remains as it is. A file named "höhö.jpg" does not get renamed.
Uploading a file named "møiré pättern.png" under OSX/Firefox:
* On an unpatched WP 3.5, only the blank space is converted into hyphen,
the file is saved as: "møiré-pättern.png".
* On WP 3.5 ''with your patch applied'', the file is saved as
"moiré-pättern.png" - the nordic "ø" has been converted into "o".
So ''something'' is working, it's just that some accented characters
aren't correctly recognized! I hope this brings us on the right track.
One thing that comes to my mind is that there are two different ways to
generate those accented characters, one of them being "combined". For
instance, there is a "single glyph" version of "é" that a hex editor
displays as "C3 A9", and a combined (base+diacritic) version that displays
as "65 CC 81".
And indeed, if I paste the filename of "moiré-pättern.png" into a text
file and open it with some hex editor, I see that the ø is a singly glyph
(= gets converted correctly), while the é and ä are combined characters.
I imagine that this could be the source of the inconsistencies? So the
result actually depends upon the OS, and perhaps even the type of
keyboard, on which the filenames have been typed.
One more test. If I copy-paste that "møiré pättern" string from the
filename into the title field of a new post, WP generates the following
permalink: "moire-pättern". That's interesting: the combined-character é
has been fixed by WP, but the combined-character ä hasn't.
FYI, my test server is running PHP 5.3.10.
And by the way, congrats on your Bug Gardener nomination :)
--
Ticket URL: <http://core.trac.wordpress.org/ticket/22363#comment:13>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list