[wp-trac] [WordPress Trac] #20383: Strip trailing punctuation with canonical URLs
WordPress Trac
noreply at wordpress.org
Sat Mar 11 16:37:03 UTC 2017
#20383: Strip trailing punctuation with canonical URLs
---------------------------------------------+-----------------------------
Reporter: nacin | Owner: SergeyBiryukov
Type: defect (bug) | Status: reopened
Priority: normal | Milestone: 4.8
Component: Canonical | Version:
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests commit | Focuses:
---------------------------------------------+-----------------------------
Changes (by ocean90):
* status: closed => reopened
* resolution: fixed =>
Comment:
I can't get the curly quote tests from [40256] passing:
{{{
1) Tests_Canonical_NoRewrite::test with data set #25 ('/?p=358“',
array('/?p=358', array('358')), 20383)
Ticket #20383
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'/?p=358'
+'/?p=358�__'
}}}
I'm using OS X and already tried several PHP versions: pre-installed PHP
5.6.28 or via homebrew 5.6.29, 7.0.15, 7.1.2 and 7.2.0-dev.
It seems like `parse_url()` is (sometimes?) not multibyte aware
* https://bugs.php.net/bug.php?id=52923
* http://bluebones.net/2013/04/parse_url-is-not-utf-8-safe/
* https://github.com/fguillot/picoFeed/issues/167
Using PHP's built-in server I get the following output for
`http://localhost:8000/test.php?p=358“`
{{{
$_GET
array(1) {
["p"]=>
string(6) "358”"
}
$_SERVER['REQUEST_URI']
string(24) "/test.php?p=358%E2%80%9D"
parse_url( $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'] )
array(4) {
["host"]=>
string(9) "localhost"
["port"]=>
int(8000)
["path"]=>
string(9) "/test.php"
["query"]=>
string(14) "p=358%E2%80%9D"
}
// What the tests are doing:
parse_url( "http://example.org/?p=358”" )
array(4) {
["scheme"]=>
string(4) "http"
["host"]=>
string(11) "example.org"
["path"]=>
string(1) "/"
["query"]=>
string(8) "p=358�__"
}
// Example form https://github.com/fguillot/picoFeed/issues/167
parse_url( "https://ru.wikipedia.org/wiki/Преступление_и_наказание" )
array(3) {
["scheme"]=>
string(5) "https"
["host"]=>
string(16) "ru.wikipedia.org"
["path"]=>
string(52) "/wiki/�_�_е�_�_�_пление_и_наказание"
}
}}}
`$_SERVER['REQUEST_URI']` contains the encoded URI which is also used by
`redirect_canonical()`. The encoding is performed client-side, like in
your browser or with `wget`. `curl` uses the raw URL which produced an
"Invalid request (Malformed HTTP request)" error for me.
I'm wondering if we really have to test the decoded strings here.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/20383#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list