[wp-trac] [WordPress Trac] #39791: sanitize_file_name() optimizations
WordPress Trac
noreply at wordpress.org
Sun Feb 5 23:24:34 UTC 2017
#39791: sanitize_file_name() optimizations
-------------------------+-----------------------------
Reporter: mgutt | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version: trunk
Severity: normal | Keywords:
Focuses: |
-------------------------+-----------------------------
This changeset:
https://core.trac.wordpress.org/changeset/29290
added this line:
{{{#!php
$filename = str_replace( array( '%20', '+' ), '-', $filename );
}}}
But because of this changeset it can be removed as those chars aren't
present anymore:
https://core.trac.wordpress.org/changeset/35122
'''Additional proposals'''
1.) After many years new special characters are added step-by-step to
sanitize_file_name(). Now almost all characters of the reserved file
system, reserved URI and unsafe URL characters lists are part of it,
except of:
reserved file system chars
(https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words)
{{{
chr(0), ..., chr(32)
}}}
the reserved URI char (https://tools.ietf.org/html/rfc3986#section-2.2):
{{{
@
}}}
the unsafe URL char (https://www.ietf.org/rfc/rfc1738.txt):
{{{
^
}}}
non-printing DEL:
{{{
chr(127)
}}}
Finally you should add all these chars to avoid future bug reports:
{{{#!php
$special_chars = array(
// file system reserved
https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
'<', '>', ':', '"', '/', '\\', '|', '?', '*',
// control characters http://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
// note: \t, \n and \r are chr(9), chr(10) and chr(13)
chr(0), chr(1), chr(2), chr(3), chr(4), chr(5), chr(6), chr(7),
chr(8), chr(9), chr(10),
chr(11), chr(12), chr(13), chr(14), chr(15), chr(16), chr(17),
chr(18), chr(19), chr(20),
chr(21), chr(22), chr(23), chr(24), chr(25), chr(26), chr(27),
chr(28), chr(29), chr(30),
chr(31),
// non-printing character <DEL>
chr(127),
// non-breaking space
chr(160),
// URI reserved https://tools.ietf.org/html/rfc3986#section-2.2
'#', '[', ']', '@', '!', '$', '&', "'", '(', ')', '+', ',', ';',
'=',
// URL unsafe characters https://www.ietf.org/rfc/rfc1738.txt
'{', '}', '^', '~', '`'
);
}}}
If you do that, do not forget to change this line:
{{{#!php
$filename = preg_replace( '/[\r\n\t -]+/', '-', $filename );
}}}
to that (because we replaced the other chars already):
{{{#!php
$filename = preg_replace( '/[ -]+/', '-', $filename );
}}}
and remove this line because we cover it already through chr(160):
{{{#!php
$filename = preg_replace( "#\x{00a0}#siu", ' ', $filename );
}}}
Source: https://en.wikipedia.org/wiki/Whitespace_character#Unicode
2.) mb_strtolower() could be used to raise windows/unix interoperability
(when downloading ftp backups or moving the host) because of their
different behaviour in case-sensitivity.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/39791>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list