[wp-trac] [WordPress Trac] #41304: Bad protocol sensitization in KSES for URLs NOT RFC 3986 compliant
WordPress Trac
noreply at wordpress.org
Thu Jul 13 10:41:31 UTC 2017
#41304: Bad protocol sensitization in KSES for URLs NOT RFC 3986 compliant
--------------------------+-----------------------------
Reporter: bogdanpreda | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version: 4.8
Severity: normal | Keywords:
Focuses: |
--------------------------+-----------------------------
For URL's that are passed through the kses sanitizer.
As specified in RFC 3986, Section 3.3
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the URI's
scheme and naming authority (if any). The path is terminated by the first
question mark ("?") or number sign ("#") character, or by the end of the
URI.
If a URI contains an authority component, then the path component must
either be empty or begin with a slash ("/") character. If a URI does not
contain an authority component, then the path cannot begin with two slash
characters ("//"). In addition, a URI reference (Section 4.1) may be a
relative-path reference, in which case the first path segment cannot
contain a colon (":") character. The ABNF requires five separate rules to
disambiguate these cases, only one of which will match the path substring
within a given URI reference. We use the generic term "path component" to
describe the URI substring matched by the parser to one of these rules.
So colon(':') is allowed inside URL's. When trying to split the URL like
this:
{{{#!php
<?php
function wp_kses_bad_protocol_once($string, $allowed_protocols, $count = 1
) {
$string2 = preg_split( '/:|�*58;|�*3a;/i', $string, 2 );
...
}}}
for URL's that do not contain a specified scheme and use colon (':')
inside the URL this breaks and returns only the second part of the URL
after the colon. Eg:
//t0.gstatic.com/images?q=tbn:ANd9GcSxT2q6fV-
59s5hq5a03fpgsFYzVtL014iARzGRG7S_3CUjYpIGNlQx0ruGtVl5KCAEOxAtb_ZQ
will return: ANd9GcSxT2q6fV-
59s5hq5a03fpgsFYzVtL014iARzGRG7S_3CUjYpIGNlQx0ruGtVl5KCAEOxAtb_ZQ
Also a “network-path reference” should be implied, in the current format
you assume a scheme exists beforehand.
Changing the split to:
{{{#!php
<?php
...
$string2 = preg_split( '/(:\/\/)|�*58;|�*3a;/i', $string, 2 );
if ( isset($string2[1]) && ! preg_match('%/\?%', $string2[0]) ) {
...
$string = $protocol . '//' . $string;
}
...
}}}
fixes this issue and is more compliant without breaking sensitization.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/41304>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list