[wp-trac] [WordPress Trac] #31373: Revamp Press This

WordPress Trac noreply at wordpress.org
Tue Mar 10 14:54:01 UTC 2015


#31373: Revamp Press This
-----------------------------+-----------------------
 Reporter:  michael-arestad  |       Owner:  azaozz
     Type:  task (blessed)   |      Status:  assigned
 Priority:  normal           |   Milestone:  4.2
Component:  Press This       |     Version:  trunk
 Severity:  normal           |  Resolution:
 Keywords:                   |     Focuses:
-----------------------------+-----------------------

Comment (by stephdau):

 We're having a problem with UTF-8 URLs: the characters are being stripped
 by `sanitize_text_field()` in `WP_Press_This::_limit_string()` right after
 the URL was html entity decoded, with UTF-8 compatibility, because
 `WP_Press_This::_limit_url()` uses it.

 So the URL http://tekartist.org/2015/03/10/%e2%99%ab-phantogram-when-im-
 small-live-at-kexp/ becomes http://tekartist.org/2015/03/10/-phantogram-
 when-im-small-live-at-kexp/ after sanitization, which leads to a 404. The
 bookmarklet appears to work on that URL because it passes some valid meta
 data and does not fetch the source, but the attribution link is bad. If
 you use the same URL in the "Scan URL" form, the fetch fails altogether,
 for the same reason.

 Should we consider not running `sanitize_text_field()` on strings that
 look like URL?
 {{{
 Index: src/wp-admin/includes/class-wp-press-this.php
 ===================================================================
 --- src/wp-admin/includes/class-wp-press-this.php       (revision 31694)
 +++ src/wp-admin/includes/class-wp-press-this.php       (working copy)
 @@ -330,8 +330,10 @@
                                 $return = $value;
                         }

 -                       $return = html_entity_decode( $return, ENT_QUOTES,
 'UTF-8' );
 -                       $return = sanitize_text_field( trim( $return ) );
 +                       $return = trim( html_entity_decode( $return,
 ENT_QUOTES, 'UTF-8' ) );
 +                       if ( ! preg_match( '/^https?:/', $return ) ) {
 +                               $return = sanitize_text_field( $return );
 +                       }
                 }

                 return $return;
 }}}

 Or simply stop `WP_Press_This::_limit_url()` from using
 `WP_Press_This::_limit_string()`, since it starts with an `is_string()`
 test?

 {{{
 Index: src/wp-admin/includes/class-wp-press-this.php
 ===================================================================
 --- src/wp-admin/includes/class-wp-press-this.php       (revision 31694)
 +++ src/wp-admin/includes/class-wp-press-this.php       (working copy)
 @@ -342,8 +342,6 @@
                         return '';
                 }

 -               $url = $this->_limit_string( $url );
 -
                 // HTTP 1.1 allows 8000 chars but the "de-facto" standard
 supported in all current browsers is 2048.
                 if ( mb_strlen( $url ) > 2048 ) {
                         return ''; // Return empty rather than a
 trunacted/invalid URL
 }}}

--
Ticket URL: <https://core.trac.wordpress.org/ticket/31373#comment:53>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list