[wp-trac] [WordPress Trac] #22363: Accents in attachment filenames should be sanitized

WordPress Trac noreply at wordpress.org
Sun Oct 2 12:46:30 UTC 2016


#22363: Accents in attachment filenames should be sanitized
-------------------------------------+---------------------------
 Reporter:  tar.gz                   |       Owner:  mikeschroder
     Type:  defect (bug)             |      Status:  assigned
 Priority:  normal                   |   Milestone:  4.7
Component:  Media                    |     Version:  3.4
 Severity:  critical                 |  Resolution:
 Keywords:  needs-testing has-patch  |     Focuses:
-------------------------------------+---------------------------

Comment (by gitlost):

 I think the problem here is that the patch has lost its way from the
 original bug report, which is really a Mac specific issue, and the same as
 various other reports such as #35951.

 The issue is that Safari always normalizes filenames to NFC. This causes
 problems if files are uploaded using other browsers (which don't
 normalize) and then viewed in Safari. The "correct" fix would be to
 likewise normalize filenames to NFC in `sanitize_file_name()`. As core
 doesn't have this facility (yet!), the workaround of using
 `remove_accents()` as a sort of poor man's normalizer seems good.

 Re the encoding of filenames in something other than UTF-8, I found the
 only way I could get current browsers to do this was by specifying the
 `accept-charset` attribute in the `<form>`, which is pretty unnatural to
 say the least (and even here Safari ignored it). So non-UTF-8 filenames
 should be treated cursorily I think and just reduced to ASCII.

 Re the other changes, they don't make much sense to me as given, without
 going to a full-blown ASCII transliteration, as has been suggested in
 various places (eg above and in #15955), and which would have major
 portability and interoperability advantages, but would obviously be a
 major change and presumably deserve its own ticket.

 FWIW, I'll upload a patch which apart from the reduce-non-UTF-8-to-ASCII
 change just replaces the `U+00A0` `preg_replace` with a straight
 `str_replace` (actually lumps it in with other subs to '-' but it amounts
 to the same), adjusting the unit tests accordingly.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/22363#comment:88>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list