[wp-trac] [WordPress Trac] #46800: protect against bad characters in media attachment metadata
WordPress Trac
noreply at wordpress.org
Thu Apr 4 18:05:36 UTC 2019
#46800: protect against bad characters in media attachment metadata
--------------------------+-----------------------------
Reporter: donpark | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Media | Version: trunk
Severity: normal | Keywords:
Focuses: |
--------------------------+-----------------------------
Media files with bad characters in embedded metadata is commonplace.
But current version of getID3 library does not fully sanitize extracted
metadata,
causing uploads, update and retrieval to fail mysteriously because
database calls fail silently.
Code snippet below is non-Core workarounds I was going to deploy before
being advised to open a Core ticket instead. It should only be used to
understand the problem because the fix at Core-level should be closer to
getID3, in places where getID3 to called extract metadata.
I also attached a ZIP file containing an MP3 with bad `composer` and
`artist` values.
{{{#!php
<?php
// Current version (1.9.14-201706111222) of getID3 library currently
returns bad string values
// that cause DB query/update/insert, serialize/unserialize, upload, and
API endpoints to fail.
// This function is used to sanitize values using `array_map`.
// Implementation's goal is to prevent failures, not correctness.
function utf8_encode_attachment_metadata( $value ) {
if ( empty( $value ) || ! is_string( $value ) ) {
return $value;
}
$encoding = mb_detect_encoding( $value, 'UTF-8, ISO-8859-1, UCS-2'
);
if ( 'UTF-8' === $encoding ) {
return $value;
}
if ( 'ISO-8859-1' === $encoding ) {
return utf8_encode( $value );
}
if ( 'UCS-2' === $encoding ) {
return mb_convert_encoding( $value, 'UTF-8', 'UCS-2' );
}
return utf8_encode( $value );
}
// Filter out bad characters in post title and content.
// For media attachments, `post_title` and `post_content` are built from
ID3 tags.
function repair_media_attachment_data( $data, $postarr ) {
$content = $data['post_content'];
if ( empty( $content ) ) {
return $data;
}
$mime_type = $postarr['post_mime_type'];
if ( empty( $mime_type ) ) {
return $data;
}
if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
return $data;
}
$content = wp_check_invalid_utf8( $content, true );
$data['post_content'] = $content;
return $data;
}
// Filter out bad characters in media attachment metadata.
function repair_media_attachment_metadata( $metadata ) {
$mime_type = $metadata['mime_type'];
if ( empty( $mime_type ) ) {
return $metadata;
}
if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
return $metadata;
}
return array_map('utf8_encode_attachment_metadata', $metadata );
}
// Sanitize media attachments with content built using media metadata.
add_filter( 'wp_insert_attachment_data', 'repair_media_attachment_data',
10, 2 );
// Sanitize extracted media attachment metadata.
add_filter( 'wp_generate_attachment_metadata',
'repair_media_attachment_metadata', 10, 1 );
// Sanitize media attachment metadata update to prevent bad characters
from being saved to DB.
add_filter( 'wp_update_attachment_metadata',
'repair_media_attachment_metadata', 10, 1 );
// Sanitize media attachment metadata on retrieval to protect against bad
characters already in DB.
add_filter( 'wp_get_attachment_metadata',
'repair_media_attachment_metadata', 10, 1 );
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/46800>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list