[wp-trac] [WordPress Trac] #45315: Canonical redirection should compare URLs taking encoding into account
WordPress Trac
noreply at wordpress.org
Fri Nov 9 00:44:48 UTC 2018
#45315: Canonical redirection should compare URLs taking encoding into account
--------------------------+-----------------------------
Reporter: straussd | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Canonical | Version: trunk
Severity: major | Keywords:
Focuses: |
--------------------------+-----------------------------
We're encountering redirect loops because the URL comparison logic does
not take encoding fully into account.
If a layer of the stack (in our case, Varnish) re-encodes query parameters
either in the "Location" header of a redirect or in a URL prior to it
reaching WordPress, WordPress will issue a redirect to its preferred
encoding, causing a loop as it attempts to "fix" the encoding choice that
another layer repeatedly makes. Such URL rewrites (with re-encoding) may
happen if the proxy is configured to sort incoming query parameters (to
improve cache hit rates) or otherwise manipulate them. These are fairly
common uses for a reverse proxy.
As noted in the source code, hex encodings are not case-sensitive. The RFC
also states that encoding ":" and "@" is optional in query string
elements. (PHP always encodes them given the lack of context in
rawurlencode.) A proxy may make different choices than WordPress, but it
should not cause WordPress to start a redirect loop.
One possible solution is to decode entities before comparing the requested
URL with the redirect URL.
For example, in wp-includes/canonical.php around line 490, this change
fixes the issue:
{{{#!php
<?php
if ( ! $redirect_url || rawurldecode($redirect_url) ==
rawurldecode($requested_url) ) {
return;
}
}}}
Technically, this change could cause some false positives where URLs match
when one contains a reserved character in the same position as an encoded
one in the other URL. This is an unlikely problem for canonical URL
redirects, but I wanted to acknowledge the limitation.
If such false positives aren't acceptable, then I would suggest comparing
the individual query string parts one by one, decoding each for its own
comparison.
Presumably, similar bugs exist for encodings of the path, but I haven't
investigated.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/45315>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list