[wp-trac] [WordPress Trac] #39550: Some Non-image files fail to upload after 4.7.1
WordPress Trac
noreply at wordpress.org
Sat Feb 4 10:40:38 UTC 2017
#39550: Some Non-image files fail to upload after 4.7.1
------------------------------------+------------------------
Reporter: greatislander | Owner: joemcgill
Type: defect (bug) | Status: assigned
Priority: normal | Milestone: 4.7.3
Component: Upload | Version: 4.7.1
Severity: critical | Resolution:
Keywords: has-patch dev-feedback | Focuses:
------------------------------------+------------------------
Comment (by mdgl):
Why on earth are we pretending that `finfo_file()` returns the "real" MIME
type of a file?
Doesn't this function just examine the first few bytes of a file, compare
it against various patterns defined in the "magic" file and take a guess
at what the MIME type might be?
Most file systems don't store the MIME type of a file separately, so this
kind of guess is the best that you can do. But it's often going to be
wrong! As others have pointed out, multiple values are also valid. A
file containing HTML can be perfectly correctly described by the MIME type
`text/plain` as well as `text/html` or even `application/octet-stream`.
It all depends on what you want to do with the data!
We can declare the MIME type of a file when it is being transferred by
HTTP, because the protocol provides a way to indicate this and specify how
we want the data to be provided and/or interpreted through the `Accept`
and `Content-Type` headers. But once the file is stored on the system,
this information is lost and it's up to other applications to determine
how they want to process the data. For example, a Web browser might want
to treat the file as `text/html` whereas a backup program might want to
consider it as `application/octet-stream`.
The only real way to determine whether a particular file contains data
that conforms to a certain MIME type is to parse it *fully* and see
whether it complies with the specification. Even then, multiple values
are possible. You need to know first what MIME type you want to test the
file against! For example, many files produced by the current Microsoft
Office suite can be considered valid ZIP archives as well as Word,
PowerPoint and so on. It is not in general possible to determine a single
"real" MIME type of a file just by examining the data.
I can understand the security concerns here and why WordPress would want
to "batten down the hatches" with respect to uploaded files, but I do
wonder if we are barking up the wrong tree with the current solution
approach.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/39550#comment:89>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list