[wp-trac] [WordPress Trac] #31373: Revamp Press This
WordPress Trac
noreply at wordpress.org
Tue Mar 10 14:54:01 UTC 2015
#31373: Revamp Press This
-----------------------------+-----------------------
Reporter: michael-arestad | Owner: azaozz
Type: task (blessed) | Status: assigned
Priority: normal | Milestone: 4.2
Component: Press This | Version: trunk
Severity: normal | Resolution:
Keywords: | Focuses:
-----------------------------+-----------------------
Comment (by stephdau):
We're having a problem with UTF-8 URLs: the characters are being stripped
by `sanitize_text_field()` in `WP_Press_This::_limit_string()` right after
the URL was html entity decoded, with UTF-8 compatibility, because
`WP_Press_This::_limit_url()` uses it.
So the URL http://tekartist.org/2015/03/10/%e2%99%ab-phantogram-when-im-
small-live-at-kexp/ becomes http://tekartist.org/2015/03/10/-phantogram-
when-im-small-live-at-kexp/ after sanitization, which leads to a 404. The
bookmarklet appears to work on that URL because it passes some valid meta
data and does not fetch the source, but the attribution link is bad. If
you use the same URL in the "Scan URL" form, the fetch fails altogether,
for the same reason.
Should we consider not running `sanitize_text_field()` on strings that
look like URL?
{{{
Index: src/wp-admin/includes/class-wp-press-this.php
===================================================================
--- src/wp-admin/includes/class-wp-press-this.php (revision 31694)
+++ src/wp-admin/includes/class-wp-press-this.php (working copy)
@@ -330,8 +330,10 @@
$return = $value;
}
- $return = html_entity_decode( $return, ENT_QUOTES,
'UTF-8' );
- $return = sanitize_text_field( trim( $return ) );
+ $return = trim( html_entity_decode( $return,
ENT_QUOTES, 'UTF-8' ) );
+ if ( ! preg_match( '/^https?:/', $return ) ) {
+ $return = sanitize_text_field( $return );
+ }
}
return $return;
}}}
Or simply stop `WP_Press_This::_limit_url()` from using
`WP_Press_This::_limit_string()`, since it starts with an `is_string()`
test?
{{{
Index: src/wp-admin/includes/class-wp-press-this.php
===================================================================
--- src/wp-admin/includes/class-wp-press-this.php (revision 31694)
+++ src/wp-admin/includes/class-wp-press-this.php (working copy)
@@ -342,8 +342,6 @@
return '';
}
- $url = $this->_limit_string( $url );
-
// HTTP 1.1 allows 8000 chars but the "de-facto" standard
supported in all current browsers is 2048.
if ( mb_strlen( $url ) > 2048 ) {
return ''; // Return empty rather than a
trunacted/invalid URL
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/31373#comment:53>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list