[wp-trac] [WordPress Trac] #39550: Some Non-image files fail to upload after 4.7.1

WordPress Trac noreply at wordpress.org
Sat Feb 4 10:40:38 UTC 2017


#39550: Some Non-image files fail to upload after 4.7.1
------------------------------------+------------------------
 Reporter:  greatislander           |       Owner:  joemcgill
     Type:  defect (bug)            |      Status:  assigned
 Priority:  normal                  |   Milestone:  4.7.3
Component:  Upload                  |     Version:  4.7.1
 Severity:  critical                |  Resolution:
 Keywords:  has-patch dev-feedback  |     Focuses:
------------------------------------+------------------------

Comment (by mdgl):

 Why on earth are we pretending that `finfo_file()` returns the "real" MIME
 type of a file?

 Doesn't this function just examine the first few bytes of a file, compare
 it against various patterns defined in the "magic" file and take a guess
 at what the MIME type might be?

 Most file systems don't store the MIME type of a file separately, so this
 kind of guess is the best that you can do.  But it's often going to be
 wrong!  As others have pointed out, multiple values are also valid.  A
 file containing HTML can be perfectly correctly described by the MIME type
 `text/plain` as well as `text/html` or even `application/octet-stream`.
 It all depends on what you want to do with the data!

 We can declare the MIME type of a file when it is being transferred by
 HTTP, because the protocol provides a way to indicate this and specify how
 we want the data to be provided and/or interpreted through the `Accept`
 and `Content-Type` headers.  But once the file is stored on the system,
 this information is lost and it's up to other applications to determine
 how they want to process the data.  For example, a Web browser might want
 to treat the file as `text/html` whereas a backup program might want to
 consider it as `application/octet-stream`.

 The only real way to determine whether a particular file contains data
 that conforms to a certain MIME type is to parse it *fully* and see
 whether it complies with the specification.  Even then, multiple values
 are possible.  You need to know first what MIME type you want to test the
 file against!  For example, many files produced by the current Microsoft
 Office suite can be considered valid ZIP archives as well as Word,
 PowerPoint and so on.  It is not in general possible to determine a single
 "real" MIME type of a file just by examining the data.

 I can understand the security concerns here and why WordPress would want
 to "batten down the hatches" with respect to uploaded files, but I do
 wonder if we are barking up the wrong tree with the current solution
 approach.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/39550#comment:89>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list