[wp-trac] [WordPress Trac] #46800: protect against bad characters in media attachment metadata

WordPress Trac noreply at wordpress.org
Thu Apr 4 18:05:36 UTC 2019


#46800: protect against bad characters in media attachment metadata
--------------------------+-----------------------------
 Reporter:  donpark       |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Media         |    Version:  trunk
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 Media files with bad characters in embedded metadata is commonplace.
 But current version of getID3 library does not fully sanitize extracted
 metadata,
 causing uploads, update and retrieval to fail mysteriously because
 database calls fail silently.

 Code snippet below is non-Core workarounds I was going to deploy before
 being advised to open a Core ticket instead. It should only be used to
 understand the problem because the fix at Core-level should be closer to
 getID3, in places where getID3 to called extract metadata.

 I also attached a ZIP file containing an MP3 with bad `composer` and
 `artist` values.

 {{{#!php
 <?php

 // Current version (1.9.14-201706111222) of getID3 library currently
 returns bad string values
 // that cause DB query/update/insert, serialize/unserialize, upload, and
 API endpoints to fail.
 // This function is used to sanitize values using `array_map`.
 // Implementation's goal is to prevent failures, not correctness.
 function utf8_encode_attachment_metadata( $value ) {
         if ( empty( $value ) || ! is_string( $value ) ) {
                 return $value;
         }
         $encoding = mb_detect_encoding( $value, 'UTF-8, ISO-8859-1, UCS-2'
 );
         if ( 'UTF-8' === $encoding ) {
                 return $value;
         }
         if ( 'ISO-8859-1' === $encoding ) {
                 return utf8_encode( $value );
         }
         if ( 'UCS-2' === $encoding ) {
                 return mb_convert_encoding( $value, 'UTF-8', 'UCS-2' );
         }
         return utf8_encode( $value );
 }

 // Filter out bad characters in post title and content.
 // For media attachments, `post_title` and `post_content` are built from
 ID3 tags.
 function repair_media_attachment_data( $data, $postarr ) {
         $content = $data['post_content'];
         if ( empty( $content ) ) {
                 return $data;
         }
         $mime_type = $postarr['post_mime_type'];
         if ( empty( $mime_type ) ) {
                 return $data;
         }
         if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
                 return $data;
         }
         $content = wp_check_invalid_utf8( $content, true );
         $data['post_content'] = $content;
         return $data;
 }

 // Filter out bad characters in media attachment metadata.
 function repair_media_attachment_metadata( $metadata ) {
         $mime_type = $metadata['mime_type'];
         if ( empty( $mime_type ) ) {
             return $metadata;
     }
         if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
             return $metadata;
     }
         return array_map('utf8_encode_attachment_metadata', $metadata );
 }

 // Sanitize media attachments with content built using media metadata.
 add_filter( 'wp_insert_attachment_data', 'repair_media_attachment_data',
 10, 2 );

 // Sanitize extracted media attachment metadata.
 add_filter( 'wp_generate_attachment_metadata',
 'repair_media_attachment_metadata', 10, 1 );
 // Sanitize media attachment metadata update to prevent bad characters
 from being saved to DB.
 add_filter( 'wp_update_attachment_metadata',
 'repair_media_attachment_metadata', 10, 1 );
 // Sanitize media attachment metadata on retrieval to protect against bad
 characters already in DB.
 add_filter( 'wp_get_attachment_metadata',
 'repair_media_attachment_metadata', 10, 1 );
 }}}

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/46800>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list