[wp-trac] [WordPress Trac] #39963: MIME Alias Handling
WordPress Trac
noreply at wordpress.org
Fri Mar 17 17:11:14 UTC 2017
#39963: MIME Alias Handling
-------------------------+------------------------------
Reporter: blobfolio | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Media | Version:
Severity: normal | Resolution:
Keywords: | Focuses:
-------------------------+------------------------------
Comment (by blobfolio):
I put together a proof of concept patch:
`wp-includes/functions.php` now contains:
-- `wp_check_real_filetype()` (function and filter)
-- `wp_check_mime_alias()` (function and filter)
-- `wp_check_application_octet_stream` (filter)
-- updated `wp_check_filetype_and_ext()` (see below)
`wp-includes/media-mimes.php`
-- `wp_get_mime_aliases()` (function and filter)
`tests/phpunit/tests/functions.php`
-- updated `big5.txt` test
This demonstrates the benefits of MIME alias handling by allowing for more
robust type matching, increased upload file validation (ALL files are
subject to type evaluation when possible), and provides some UX
improvements (ALL incorrectly named files, if otherwise valid, are renamed
with the correct extension).
`wp_check_real_filetype()` begins with a name-based approach (i.e.
`wp_check_filetype()`). If that fails, the failure is passed on. If it
succeeds, it attempts to evaluate the "real" type using EXIF (not yet
implemented, waiting on #40017), or that failing, FILEINFO. If either
evaluation succeeds, the "real" type is compared against the known aliases
for the file extension. If the alias is good, the name-based type is
returned (i.e. WordPress' hardcoded definitions take priority). If the
real MIME does not match the extension, but it ''is'' whitelisted, that
MIME and the *correct* extension are returned. If the "real" MIME is not
whitelisted, `false` is returned. If no content-based evaluation can be
performed, the name-based results are returned.
`wp_check_mime_alias()` has automatic handling for temporary
`x-subtype`/`subtype` variations (e.g. `application/font-woff` and
`application/x-font-woff` are considered equivalent). By default it also
soft-matches `application/octet-stream` against any extension, as that
tends to be the response returned by a server when it doesn't know what a
file is. That behavior can be overridden using the
`wp_check_application_octet_stream` filter. All checks are case-
insensitive and strip out invalid characters.
`wp_check_filetype_and_ext()` is updated to call
`wp_check_real_filetype()` instead of `wp_check_filetype()`. This covers
whitelist checks and type-based evaluation. Renaming is now applied to all
files, not just the small subset of image types in the original version.
`image/*` and `application/*` checks are removed as unnecessary (all
content is evaluated now). The ultimate determinations are still
filterable as before.
All original PHPUnit tests pass, with the exception of test which passes
`big5.txt` as a JPEG; because of the improvements, the result is correctly
identified and accepted as a `text/plain` file. ;) This patch updates the
test accordingly.
{{{
$ phpunit tests/phpunit/tests/functions.php
Installing...
Running as single site... To run multisite, use -c
tests/phpunit/multisite.xml
Not running ajax tests. To execute these, use --group ajax.
Not running ms-files tests. To execute these, use --group ms-files.
Not running external-http tests. To execute these, use --group external-
http.
PHPUnit 5.4.6 by Sebastian Bergmann and contributors.
................................................................. 65 / 83
( 78%)
.................. 83 / 83
(100%)
Time: 918 ms, Memory: 24.00Mb
OK (83 tests, 565 assertions)
}}}
It is a lot to digest, I know. But the benefits are numerous. It mitigates
the issues in #40175, but happens to do so in a way that ''increases''
upload security, and file integrity more generally (for example, users
won't accidentally hear a screeching "MP3" because that MP3 is really an
OGG).
The MIME database (`media-mimes.php`) will need to be indefinitely
maintained, as MIME data is always changing. That data, however, is
automatically, regularly re-built independently of WordPress. I would
propose we aim to update the data once per major release, a task I am more
than happy to adopt.
Speaking of, the MIME data in this patch is at roughly 1750 entries. While
that data will never be 100% complete, as-is it already improves
WordPress' chances of correctly identifying a file by over 2000%.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/39963#comment:14>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list