[wp-trac] [WordPress Trac] #62995: Uploading Mac screenshots results in broken images, due to question marks inserted in filenames

WordPress Trac noreply at wordpress.org
Thu Jun 26 18:22:27 UTC 2025


#62995: Uploading Mac screenshots results in broken images, due to question marks
inserted in filenames
-------------------------------+------------------------------
 Reporter:  room34             |       Owner:  (none)
     Type:  defect (bug)       |      Status:  new
 Priority:  normal             |   Milestone:  Awaiting Review
Component:  Media              |     Version:
 Severity:  normal             |  Resolution:
 Keywords:  reporter-feedback  |     Focuses:  administration
-------------------------------+------------------------------

Comment (by dmsnell):

 Briefly confirmed:
  - `sanitize_file_name()` does in fact already remove the `?` character
 from a filename.
  - The U+202F is replaced on save in the database for @room34 because the
 table’s collation does not have a way to represent that character and thus
 uses the replacement character `?` instead.
   - This particular ticket is highlighting a //symptom// of a much broader
 problem, which is that WordPress is agnostic to text encodings. (see
 #62172)
   - This is //not// solved via `blog_charset` and solutions interacting
 with `blog_charset` risk introducing bigger issues
 - Many places perform HTML-escaping on the URL but do not attempt to
 perform URL-escaping on the URL.

 It’s possible we could circumvent all sorts of issues simply by creating
 proper URLs for these attachments. The filenames are not causing the
 problems, and if we percent-escaped non-ASCII characters in a URL then we
 should avoid all sorts of issues for database tables with non-UTF-8
 collations as well, because all more-or-less supported character encodings
 for WordPress are ASCII-compatible (with exceptions, of course).

 In fact if we performed this computation we could even relax the
 transformation on the filename since we only need to try and avoid
 problems when sharing the files with other services which apply their own
 restrictions (see [https://core.trac.wordpress.org/ticket/62995#comment:10
 the second issue in my comment above]).

 `sanitize_file_name()` currently replaces a percent sign with a dash, but
 I don’t know if any system has problems with filenames including a percent
 sign. perhaps we could drop all of the complex logic in there and use
 something like this instead?

 {{{#!php
 <?php

 function url_encoded_filename( $raw_filename_bytes ) {
         $encoded = '';
         $length  = strlen( $raw_filename_bytes );

         for ( $i = 0; $i < $length; $i++ ) {
                 $allowable_length = strspn( $raw_filename_bytes,
 '-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~', $i );
                 if ( $allowable_length > 0 ) {
                         $encoded .= substr( $raw_filename_bytes, $i,
 $allowable_length );
                         $i += $allowable_length;
                         continue;
                 }

                 $hex      = dechex( $b );
                 $encoded .= "%{$hex}";
         }

         return $encoded;
 }
 }}}

 of course this is just some doodling. a real proposal should at a minimum
 adhere more closely to the WHATWG URL spec on what needs to be percent-
 encoded. but still, the output of this function will be:

  - ASCII safe
  - fully visible (no hidden characters)
  - URL safe (though the URLs, if properly encoded, will double-escape the
 `%` signs)

 more thought is warranted here but I hope that this communicates we may
 have an opportunity to use a comprehensive solution that eliminates the
 problem altogether and doesn’t have to rely on repeated bug reports and
 individual appearances of failures from the same root problem.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62995#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list