[wp-trac] [WordPress Trac] #22363: Accents in attachment filenames should be sanitized

WordPress Trac noreply at wordpress.org
Wed Jan 23 09:37:22 UTC 2013


#22363: Accents in attachment filenames should be sanitized
--------------------------+------------------
 Reporter:  tar.gz        |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  3.6
Component:  Upload        |     Version:  3.4
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |
--------------------------+------------------

Comment (by tar.gz):

 Indeed, the results are different in Linux, Windows, OSX.

 Here is the current status of my testing:

 '''Viewing''' of images with accents ("moiré.jpg") is broken in:
 * Safari 5.1.7 on OSX (wasn't able to test Safari 6 yet).
 * Safari on iOS 5.1.1 (test device: iPod)
 * Safari on iOS 6 (test device: iPad)

 With the patch you provided, when '''uploading''' a file named
 "Forêt.jpg":

 * Windows Vista / IE7 : works, saves as "Foret.jpg".
 * Ubuntu 11.04 / Firefox 18 : works, saves as "Foret.jpg".
 * Mac OSX 10.6 / Firefox 18 : doesn't work, saves as "Forêt.jpg".

 Then I tested with some other characters, and found out that it's even
 more complicated:

 Uploading on Ubuntu/Firefox: ê, ç, ä get converted (to e, c, a) but the
 "ö" remains as it is. A file named "höhö.jpg" does not get renamed.

 Uploading a file named "møiré pättern.png" under OSX/Firefox:

 * On an unpatched WP 3.5, only the blank space is converted into hyphen,
 the file is saved as: "møiré-pättern.png".
 * On WP 3.5 ''with your patch applied'', the file is saved as
 "moiré-pättern.png" - the nordic "ø" has been converted into "o".

 So ''something'' is working, it's just that some accented characters
 aren't correctly recognized! I hope this brings us on the right track.

 One thing that comes to my mind is that there are two different ways to
 generate those accented characters, one of them being "combined". For
 instance, there is a "single glyph" version of "é" that a hex editor
 displays as "C3 A9", and a combined (base+diacritic) version that displays
 as "65 CC 81".

 And indeed, if I paste the filename of "moiré-pättern.png" into a text
 file and open it with some hex editor, I see that the ø is a singly glyph
 (= gets converted correctly), while the é and ä are combined characters.

 I imagine that this could be the source of the inconsistencies? So the
 result actually depends upon the OS, and perhaps even the type of
 keyboard, on which the filenames have been typed.

 One more test. If I copy-paste that "møiré pättern" string from the
 filename into the title field of a new post, WP generates the following
 permalink: "moire-pättern". That's interesting: the combined-character é
 has been fixed by WP, but the combined-character ä hasn't.

 FYI, my test server is running PHP 5.3.10.

 And by the way, congrats on your Bug Gardener nomination :)

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/22363#comment:13>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list