<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[38726] trunk: HTTP API: Simplify `wp_parse_url()` to ensure consistent results.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/38726">38726</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/38726","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>peterwilsoncc</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2016-10-04 20:32:40 +0000 (Tue, 04 Oct 2016)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTTP API: Simplify `wp_parse_url()` to ensure consistent results.

<a href="https://core.trac.wordpress.org/changeset/38694">[38694]</a> revealed some URL formats were been parsed incorrectly, including those used by Google Fonts. This change simplifies the function to use placeholder values which cause PHP's parsing to behave consistently.

Props jrf, peterwilsoncc.
Fixes <a href="https://core.trac.wordpress.org/ticket/36356">#36356</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshttpphp">trunk/src/wp-includes/http.php</a></li>
<li><a href="#trunktestsphpunittestshttphttpphp">trunk/tests/phpunit/tests/http/http.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshttpphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/http.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/http.php    2016-10-04 20:26:09 UTC (rev 38725)
+++ trunk/src/wp-includes/http.php      2016-10-04 20:32:40 UTC (rev 38726)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -623,12 +623,17 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * A wrapper for PHP's parse_url() function that handles edgecases in < PHP 5.4.7
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * A wrapper for PHP's parse_url() function that handles consistency in the return
+ * values across PHP versions.
</ins><span class="cx" style="display: block; padding: 0 10px">  *
</span><span class="cx" style="display: block; padding: 0 10px">  * PHP 5.4.7 expanded parse_url()'s ability to handle non-absolute url's, including
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * schemeless and relative url's with :// in the path, this works around those
- * limitations providing a standard output on PHP 5.2~5.4+.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * schemeless and relative url's with :// in the path. This function works around
+ * those limitations providing a standard output on PHP 5.2~5.4+.
</ins><span class="cx" style="display: block; padding: 0 10px">  *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Secondly, across various PHP versions, schemeless URLs starting containing a ":"
+ * in the query are being handled inconsistently. This function works around those
+ * differences as well.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px">  * Error suppression is used as prior to PHP 5.3.3, an E_WARNING would be generated
</span><span class="cx" style="display: block; padding: 0 10px">  * when URL parsing failed.
</span><span class="cx" style="display: block; padding: 0 10px">  *
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -640,63 +645,96 @@
</span><span class="cx" style="display: block; padding: 0 10px">  *                          predefined constants to specify which one.
</span><span class="cx" style="display: block; padding: 0 10px">  *                          Defaults to -1 (= return all parts as an array).
</span><span class="cx" style="display: block; padding: 0 10px">  *                          @see http://php.net/manual/en/function.parse-url.php
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @return mixed False on failure; Array of URL components on success;
- *               When a specific component has been requested: null if the component doesn't
- *               exist in the given URL; a sting or - in the case of PHP_URL_PORT - integer
- *               when it does; See parse_url()'s return values.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @return mixed False on parse failure; Array of URL components on success;
+ *               When a specific component has been requested: null if the component
+ *               doesn't exist in the given URL; a sting or - in the case of
+ *               PHP_URL_PORT - integer when it does. See parse_url()'s return values.
</ins><span class="cx" style="display: block; padding: 0 10px">  */
</span><span class="cx" style="display: block; padding: 0 10px"> function wp_parse_url( $url, $component = -1 ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        $parts = @parse_url( $url, $component );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $to_unset = array();
+       $url = strval( $url );
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( version_compare( PHP_VERSION, '5.4.7', '>=' ) ) {
-               return $parts;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( '//' === substr( $url, 0, 2 ) ) {
+               $to_unset[] = 'scheme';
+               $url = 'placeholder:' . $url;
+       } elseif ( '/' === substr( $url, 0, 1 ) ) {
+               $to_unset[] = 'scheme';
+               $to_unset[] = 'host';
+               $url = 'placeholder://placeholder' . $url;
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+        $parts = @parse_url( $url );
+
</ins><span class="cx" style="display: block; padding: 0 10px">         if ( false === $parts ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                // < PHP 5.4.7 compat, trouble with relative paths including a scheme break in the path.
-               if ( '/' == $url[0] && false !== strpos( $url, '://' ) ) {
-                       if ( in_array( $component, array( PHP_URL_SCHEME, PHP_URL_HOST ), true ) ) {
-                               return null;
-                       }
-                       // Since we know it's a relative path, prefix with a scheme/host placeholder and try again.
-                       if ( ! $parts = @parse_url( 'placeholder://placeholder' . $url, $component ) ) {
-                               return $parts;
-                       }
-                       // Remove the placeholder values.
-                       if ( -1 === $component ) {
-                               unset( $parts['scheme'], $parts['host'] );
-                       }
-               } else {
-                       return $parts;
-               }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         // Parsing failure.
+               return $parts;
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        // < PHP 5.4.7 compat, doesn't detect a schemeless URL's host field.
-       if ( '//' == substr( $url, 0, 2 ) ) {
-               if ( -1 === $component && ! isset( $parts['host'] ) ) {
-                       $path_parts = explode( '/', substr( $parts['path'], 2 ), 2 );
-                       $parts['host'] = $path_parts[0];
-                       if ( isset( $path_parts[1] ) ) {
-                               $parts['path'] = '/' . $path_parts[1];
-                       } else {
-                               unset( $parts['path'] );
-                       }
-               } elseif ( PHP_URL_HOST === $component || PHP_URL_PATH === $component ) {
-                       $all_parts = @parse_url( $url );
-                       if ( ! isset( $all_parts['host'] ) ) {
-                               $path_parts = explode( '/', substr( $all_parts['path'], 2 ), 2 );
-                               if ( PHP_URL_PATH === $component ) {
-                                       if ( isset( $path_parts[1] ) ) {
-                                               $parts = '/' . $path_parts[1];
-                                       } else {
-                                               $parts = null;
-                                       }
-                               } elseif ( PHP_URL_HOST === $component ) {
-                                       $parts = $path_parts[0];
-                               }
-                       }
-               }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // Remove the placeholder values.
+       foreach ( $to_unset as $key ) {
+               unset( $parts[ $key ] );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        return $parts;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return _get_component_from_parsed_url_array( $parts, $component );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+/**
+ * Retrieve a specific component from a parsed URL array.
+ *
+ * @internal
+ *
+ * @since 4.7.0
+ *
+ * @param array|false $url_parts The parsed URL. Can be false if the URL failed to parse.
+ * @param int    $component The specific component to retrieve. Use one of the PHP
+ *                          predefined constants to specify which one.
+ *                          Defaults to -1 (= return all parts as an array).
+ *                          @see http://php.net/manual/en/function.parse-url.php
+ * @return mixed False on parse failure; Array of URL components on success;
+ *               When a specific component has been requested: null if the component
+ *               doesn't exist in the given URL; a sting or - in the case of
+ *               PHP_URL_PORT - integer when it does. See parse_url()'s return values.
+ */
+function _get_component_from_parsed_url_array( $url_parts, $component = -1 ) {
+       if ( -1 === $component ) {
+               return $url_parts;
+       }
+
+       $key = _wp_translate_php_url_constant_to_key( $component );
+       if ( false !== $key && is_array( $url_parts ) && isset( $url_parts[ $key ] ) ) {
+               return $url_parts[ $key ];
+       } else {
+               return null;
+       }
+}
+
+/**
+ * Translate a PHP_URL_* constant to the named array keys PHP uses.
+ *
+ * @internal
+ *
+ * @since 4.7.0
+ *
+ * @see   http://php.net/manual/en/url.constants.php
+ *
+ * @param int $constant PHP_URL_* constant.
+ * @return string|bool The named key or false.
+ */
+function _wp_translate_php_url_constant_to_key( $constant ) {
+       $translation = array(
+               PHP_URL_SCHEME   => 'scheme',
+               PHP_URL_HOST     => 'host',
+               PHP_URL_PORT     => 'port',
+               PHP_URL_USER     => 'user',
+               PHP_URL_PASS     => 'pass',
+               PHP_URL_PATH     => 'path',
+               PHP_URL_QUERY    => 'query',
+               PHP_URL_FRAGMENT => 'fragment',
+       );
+
+       if ( isset( $translation[ $constant ] ) ) {
+               return $translation[ $constant ];
+       } else {
+               return false;
+       }
+}
</ins></span></pre></div>
<a id="trunktestsphpunittestshttphttpphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/http/http.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/http/http.php   2016-10-04 20:26:09 UTC (rev 38725)
+++ trunk/tests/phpunit/tests/http/http.php     2016-10-04 20:32:40 UTC (rev 38726)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -107,6 +107,29 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        // PHP's parse_url() calls this an invalid url, we handle it as a path
</span><span class="cx" style="display: block; padding: 0 10px">                        array( '/://example.com/', array( 'path' => '/://example.com/' ) ),
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        // Schemeless URL containing colons cause parse errors in PHP 7+.
+                       array(
+                               '//fonts.googleapis.com/css?family=Open+Sans:400&subset=latin',
+                               array(
+                                       'host'  => 'fonts.googleapis.com',
+                                       'path'  => '/css',
+                                       'query' => 'family=Open+Sans:400&subset=latin',
+                               ),
+                       ),
+                       array(
+                               '//fonts.googleapis.com/css?family=Open+Sans:400',
+                               array(
+                                       'host'  => 'fonts.googleapis.com',
+                                       'path'  => '/css',
+                                       'query' => 'family=Open+Sans:400',
+                               ),
+                       ),
+
+                       array( 'filenamefound', array( 'path' => 'filenamefound' ) ),
+
+                       // Empty string or non-string passed in.
+                       array( '', array( 'path' => '' ) ),
+                       array( 123, array( 'path' => '123' ) ),
</ins><span class="cx" style="display: block; padding: 0 10px">                 );
</span><span class="cx" style="display: block; padding: 0 10px">                /*
</span><span class="cx" style="display: block; padding: 0 10px">                Untestable edge cases in various PHP:
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -117,7 +140,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="cx" style="display: block; padding: 0 10px">         * @ticket 36356
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-     */
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+  */
</ins><span class="cx" style="display: block; padding: 0 10px">         function test_wp_parse_url_with_default_component() {
</span><span class="cx" style="display: block; padding: 0 10px">                $actual = wp_parse_url( self::FULL_TEST_URL, -1 );
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertEquals( array(
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -175,6 +198,21 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        // PHP's parse_url() calls this an invalid URL, we handle it as a path.
</span><span class="cx" style="display: block; padding: 0 10px">                        array( '/://example.com/', PHP_URL_PATH, '/://example.com/' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        // Schemeless URL containing colons cause parse errors in PHP 7+.
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400&subset=latin', PHP_URL_HOST, 'fonts.googleapis.com' ),
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400&subset=latin', PHP_URL_PORT, null ),
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400&subset=latin', PHP_URL_PATH, '/css' ),
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400&subset=latin', PHP_URL_QUERY, 'family=Open+Sans:400&subset=latin' ),
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400', PHP_URL_HOST, 'fonts.googleapis.com' ), // 25
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400', PHP_URL_PORT, null ),
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400', PHP_URL_PATH, '/css' ), //27
+                       array( '//fonts.googleapis.com/css?family=Open+Sans:400', PHP_URL_QUERY, 'family=Open+Sans:400' ), //28
+
+                       // Empty string or non-string passed in.
+                       array( '', PHP_URL_PATH, '' ),
+                       array( '', PHP_URL_QUERY, null ),
+                       array( 123, PHP_URL_PORT, null ),
+                       array( 123, PHP_URL_PATH, '123' ),
</ins><span class="cx" style="display: block; padding: 0 10px">                 );
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -224,4 +262,56 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+       /**
+        * @ticket 36356
+        *
+        * @dataProvider get_component_from_parsed_url_array_testcases
+        */
+       function test_get_component_from_parsed_url_array( $url, $component, $expected ) {
+               $parts  = wp_parse_url( $url );
+               $actual = _get_component_from_parsed_url_array( $parts, $component );
+               $this->assertSame( $expected, $actual );
+       }
+
+       function get_component_from_parsed_url_array_testcases() {
+               // 0: A URL, 1: PHP URL constant, 2: The expected result.
+               return array(
+                       array( 'http://example.com/', -1, array( 'scheme' => 'http', 'host' => 'example.com', 'path' => '/' ) ),
+                       array( 'http://example.com/', -1, array( 'scheme' => 'http', 'host' => 'example.com', 'path' => '/' ) ),
+                       array( 'http://example.com/', PHP_URL_HOST, 'example.com' ),
+                       array( 'http://example.com/', PHP_URL_USER, null ),
+                       array( 'http:///example.com', -1, false ), // Malformed.
+                       array( 'http:///example.com', PHP_URL_HOST, null ), // Malformed.
+               );
+       }
+
+       /**
+        * @ticket 36356
+        *
+        * @dataProvider wp_translate_php_url_constant_to_key_testcases
+        */
+       function test_wp_translate_php_url_constant_to_key( $input, $expected ) {
+               $actual = _wp_translate_php_url_constant_to_key( $input );
+               $this->assertSame( $expected, $actual );
+       }
+
+       function wp_translate_php_url_constant_to_key_testcases() {
+               // 0: PHP URL constant, 1: The expected result.
+               return array(
+                       array( PHP_URL_SCHEME, 'scheme' ),
+                       array( PHP_URL_HOST, 'host' ),
+                       array( PHP_URL_PORT, 'port' ),
+                       array( PHP_URL_USER, 'user' ),
+                       array( PHP_URL_PASS, 'pass' ),
+                       array( PHP_URL_PATH, 'path' ),
+                       array( PHP_URL_QUERY, 'query' ),
+                       array( PHP_URL_FRAGMENT, 'fragment' ),
+
+                       // Test with non-PHP_URL_CONSTANT parameter.
+                       array( 'something', false ),
+                       array( ABSPATH, false ),
+               );
+       }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre>
</div>
</div>

</body>
</html>