[wp-trac] [WordPress Trac] #62995: Uploading Mac screenshots results in broken images, due to question marks inserted in filenames
WordPress Trac
noreply at wordpress.org
Thu Jun 26 18:22:27 UTC 2025
#62995: Uploading Mac screenshots results in broken images, due to question marks
inserted in filenames
-------------------------------+------------------------------
Reporter: room34 | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Media | Version:
Severity: normal | Resolution:
Keywords: reporter-feedback | Focuses: administration
-------------------------------+------------------------------
Comment (by dmsnell):
Briefly confirmed:
- `sanitize_file_name()` does in fact already remove the `?` character
from a filename.
- The U+202F is replaced on save in the database for @room34 because the
table’s collation does not have a way to represent that character and thus
uses the replacement character `?` instead.
- This particular ticket is highlighting a //symptom// of a much broader
problem, which is that WordPress is agnostic to text encodings. (see
#62172)
- This is //not// solved via `blog_charset` and solutions interacting
with `blog_charset` risk introducing bigger issues
- Many places perform HTML-escaping on the URL but do not attempt to
perform URL-escaping on the URL.
It’s possible we could circumvent all sorts of issues simply by creating
proper URLs for these attachments. The filenames are not causing the
problems, and if we percent-escaped non-ASCII characters in a URL then we
should avoid all sorts of issues for database tables with non-UTF-8
collations as well, because all more-or-less supported character encodings
for WordPress are ASCII-compatible (with exceptions, of course).
In fact if we performed this computation we could even relax the
transformation on the filename since we only need to try and avoid
problems when sharing the files with other services which apply their own
restrictions (see [https://core.trac.wordpress.org/ticket/62995#comment:10
the second issue in my comment above]).
`sanitize_file_name()` currently replaces a percent sign with a dash, but
I don’t know if any system has problems with filenames including a percent
sign. perhaps we could drop all of the complex logic in there and use
something like this instead?
{{{#!php
<?php
function url_encoded_filename( $raw_filename_bytes ) {
$encoded = '';
$length = strlen( $raw_filename_bytes );
for ( $i = 0; $i < $length; $i++ ) {
$allowable_length = strspn( $raw_filename_bytes,
'-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~', $i );
if ( $allowable_length > 0 ) {
$encoded .= substr( $raw_filename_bytes, $i,
$allowable_length );
$i += $allowable_length;
continue;
}
$hex = dechex( $b );
$encoded .= "%{$hex}";
}
return $encoded;
}
}}}
of course this is just some doodling. a real proposal should at a minimum
adhere more closely to the WHATWG URL spec on what needs to be percent-
encoded. but still, the output of this function will be:
- ASCII safe
- fully visible (no hidden characters)
- URL safe (though the URLs, if properly encoded, will double-escape the
`%` signs)
more thought is warranted here but I hope that this communicates we may
have an opportunity to use a comprehensive solution that eliminates the
problem altogether and doesn’t have to rely on repeated bug reports and
individual appearances of failures from the same root problem.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62995#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list