[wp-trac] [WordPress Trac] #45315: Canonical redirection should compare URLs taking encoding into account

WordPress Trac noreply at wordpress.org
Fri Nov 9 00:44:48 UTC 2018


#45315: Canonical redirection should compare URLs taking encoding into account
--------------------------+-----------------------------
 Reporter:  straussd      |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Canonical     |    Version:  trunk
 Severity:  major         |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 We're encountering redirect loops because the URL comparison logic does
 not take encoding fully into account.

 If a layer of the stack (in our case, Varnish) re-encodes query parameters
 either in the "Location" header of a redirect or in a URL prior to it
 reaching WordPress, WordPress will issue a redirect to its preferred
 encoding, causing a loop as it attempts to "fix" the encoding choice that
 another layer repeatedly makes. Such URL rewrites (with re-encoding) may
 happen if the proxy is configured to sort incoming query parameters (to
 improve cache hit rates) or otherwise manipulate them. These are fairly
 common uses for a reverse proxy.

 As noted in the source code, hex encodings are not case-sensitive. The RFC
 also states that encoding ":" and "@" is optional in query string
 elements. (PHP always encodes them given the lack of context in
 rawurlencode.) A proxy may make different choices than WordPress, but it
 should not cause WordPress to start a redirect loop.

 One possible solution is to decode entities before comparing the requested
 URL with the redirect URL.

 For example, in wp-includes/canonical.php around line 490, this change
 fixes the issue:

 {{{#!php
 <?php
 if ( ! $redirect_url || rawurldecode($redirect_url) ==
 rawurldecode($requested_url) ) {
     return;
 }
 }}}

 Technically, this change could cause some false positives where URLs match
 when one contains a reserved character in the same position as an encoded
 one in the other URL. This is an unlikely problem for canonical URL
 redirects, but I wanted to acknowledge the limitation.

 If such false positives aren't acceptable, then I would suggest comparing
 the individual query string parts one by one, decoding each for its own
 comparison.

 Presumably, similar bugs exist for encodings of the path, but I haven't
 investigated.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/45315>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list