[wp-trac] [WordPress Trac] #50855: sanitize_file_name function not working as expected if there is '%20' in filename
WordPress Trac
noreply at wordpress.org
Wed Mar 30 15:16:08 UTC 2022
#50855: sanitize_file_name function not working as expected if there is '%20' in
filename
-------------------------------------------------+-------------------------
Reporter: dishitpala | Owner: audrasjb
Type: defect (bug) | Status: accepted
Priority: normal | Milestone: 6.0
Component: Formatting | Version: 5.4.2
Severity: normal | Resolution:
Keywords: has-patch has-screenshots has-unit- | Focuses:
tests dev-feedback |
-------------------------------------------------+-------------------------
Comment (by craigfrancis):
I'm wondering if
[https://core.trac.wordpress.org/attachment/ticket/50855/50855.2.diff
50855.2.diff] is the correct approach, by simply moving the current
`array( '%20', '+' )` up 1 line, so both ways of URL encoding a space are
handled before `$special_chars`.
As to the failing `wp_unique_filename()` test, that is:
{{{
wp_unique_filename( $testdir, '12%af34567890#~!@#$..%^&*()|_+qwerty
fgh`jkl zx<>?:"{}[]="\'/?.png' )
-'12af34567890 at ..^_qwerty-fghjkl-zx.png'
+'12af34567890 at ..^_-qwerty-fghjkl-zx.png'
}}}
The extra `'-'` is coming from the `'+'` before `qwerty`, so maybe just
update the expected test result?
Tonya, I agree this could be a breaking change (e.g. while this usually
has other issues as well, a developer could save files using
`sanitize_file_name()` and store the original file names in their
database)... but there has kinda been a breaking change already, where the
feature that replaces `'%20'` and `'+'` so they became `'-'` was broken (I
presume WordPress wanted to visually separate words in a file name, even
when the name was URL encoded).
An alternative would be to simply remove that line, and don't handle the
URL encoding of spaces.
---
Tangent stuff...
Tonya, thanks for finding [https://core.trac.wordpress.org/changeset/35122
Changeset 35122], the addition of `'%'` and `'+'` in `$special_chars` via
[https://core.trac.wordpress.org/changeset/35122 Changeset 35122] did
break this feature (oddly `test_replace_spaces()` explicitly tests `'multi
%20 +space.png' => 'multi-20-space.png'`).
For the [https://github.com/WordPress/wordpress-develop/pull/1873 patch
from Tonya], the `'%20'` replacement happens before `$special_chars` on
[https://github.com/WordPress/wordpress-
develop/blob/7120c9cda9c9994e54d3432b967a85bc2c383025/src/wp-
includes/formatting.php#L2020 line 2020]; but keeps the `'+'` replacement
on [https://github.com/WordPress/wordpress-
develop/blob/7120c9cda9c9994e54d3432b967a85bc2c383025/src/wp-
includes/formatting.php#L2022 line 2022], I where don't think that will do
anything (Peter has come to the same conclusion).
I'm going to trust the `$special_chars` is complete (I'm not sure if there
are other problematic characters)... but either way, they can be replaced
by a hyphen or be removed. Both replacement methods are safe, and I'm
pretty sure it's just personal taste on which one to use (e.g.
"MyFile(1).jpg", should that be "MyFile1.jpg" or "MyFile-1.jpg"?). For now
I'd prefer to stay with `'%20'` and `'+'`, with a separate discussion on
what other characters could be replaced by a hyphen.
To answer your questions, I think the `'+'` should be replaced with a
`'-'`, because that seems to be the intended and original behaviour (why
the `array( '%20', '+' )` was added in the first place). Secondly, I don't
think leaving `'20'` was by design (a half encoded value), and I cannot
think of any issue with replacing `'%20'` with a `'-'` in a function that
intentionally removes bad characters from a file name.
With PHP, the `urlencode()` function follows RFC 1738 and its use of URI
defined in RFC 1630 (ref
[https://datatracker.ietf.org/doc/html/rfc1630#page-7 page 7], which says
"Within the query string, the plus sign is reserved as shorthand notation
for a space."), this is why a space is often encoded a space to `'+'`
([https://github.com/php/php-
src/blob/48e07076209c6163ecb4a756f73f511f1f09200e/ext/standard/url.c#L536
source]), whereas `rawurlencode()` follows RFC 3986 and encodes a space to
`'%20'` (i.e. it drops the special case). In both cases we are assuming
the file name has been URL encoded.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/50855#comment:14>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list