<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"

"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />

<title>[60969] trunk: Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.</title>

</head>

<body>

<style type="text/css"><!--

#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }

#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }

#msg dt:after { content:':';}

#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }

#msg dl a { font-weight: bold}

#msg dl a:link    { color:#fc3; }

#msg dl a:active  { color:#ff0; }

#msg dl a:visited { color:#cc6; }

h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }

#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }

#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }

#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }

#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }

#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }

#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }

#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }

#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }

#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }

#logmsg pre { background: #eee; padding: 1em; }

#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}

#logmsg dl { margin: 0; }

#logmsg dt { font-weight: bold; }

#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }

#logmsg dd:before { content:'\00bb';}

#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }

#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }

#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }

#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }

#logmsg table th.Corner { text-align: left; }

#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }

#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }

#patch { width: 100%; }

#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}

#patch .propset h4, #patch .binary h4 {margin:0;}

#patch pre {padding:0;line-height:1.2em;margin:0;}

#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}

#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}

#patch span {display:block;padding:0 10px;}

#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}

#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}

#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}

#patch .lines, .info {color:#888;background:#fff;}

--></style>

<div id="msg">

<dl class="meta" style="font-size: 105%">

<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/60969">60969</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/60969","name":"Review Commit"}}</script></dd>

<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>

<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2025-10-18 04:34:02 +0000 (Sat, 18 Oct 2025)</dd>

</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.

The existing polyfill for `mb_substr()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of characters (1,000 at a time, iterating until complete), and re-joins them at the end.

This patch provides an updated polyfill which will reliably parse UTF-8 strings even in the presence of invalid bytes. It computes boundaries for the substring extraction with zero allocations and then returns a single `substr()` call at the end.

This change improves the reliability of UTF-8 string handling and removes behavioral variability based on the runtime system.

Developed in https://github.com/WordPress/wordpress-develop/pull/9829

Discussed in https://core.trac.wordpress.org/ticket/63863

See <a href="https://core.trac.wordpress.org/ticket/63863">#63863</a>.</pre>

<h3>Modified Paths</h3>

<ul>

<li><a href="#trunksrcwpincludescompatutf8php">trunk/src/wp-includes/compat-utf8.php</a></li>

<li><a href="#trunksrcwpincludescompatphp">trunk/src/wp-includes/compat.php</a></li>

<li><a href="#trunktestsphpunittestscompatmbSubstrphp">trunk/tests/phpunit/tests/compat/mbSubstr.php</a></li>

</ul>

</div>

<div id="patch">

<h3>Diff</h3>

<a id="trunksrcwpincludescompatutf8php"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/compat-utf8.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/compat-utf8.php     2025-10-17 23:52:41 UTC (rev 60968)

+++ trunk/src/wp-includes/compat-utf8.php       2025-10-18 04:34:02 UTC (rev 60969)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -339,6 +339,48 @@

</span><span class="cx" style="display: block; padding: 0 10px"> }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px"> /**

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Given a starting offset within a string and a maximum number of code points,

+ * return how many bytes are occupied by the span of characters.

+ *

+ * Invalid spans of bytes count as a single code point according to the maximal

+ * subpart rule. This function is a fallback method for calling

+ * `strlen( mb_substr( substr( $text, $at ), 0, $max_code_points ) )`.

+ *

+ * @since 6.9.0

+ * @access private

+ *

+ * @param string $text              Count bytes of span in this text.

+ * @param int    $byte_offset       Start counting at this byte offset.

+ * @param int    $max_code_points   Stop counting after this many code points have been seen,

+ *                                  or at the end of the string.

+ * @param ?int   $found_code_points Optional. Will be set to number of found code points in

+ *                                  span, as this might be smaller than the maximum count if

+ *                                  the string is not long enough.

+ * @return int Number of bytes spanned by the code points.

+ */

+function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {

+       $was_at            = $byte_offset;

+       $invalid_length    = 0;

+       $end               = strlen( $text );

+       $found_code_points = 0;

+

+       while ( $byte_offset < $end && $found_code_points < $max_code_points ) {

+               $needed      = $max_code_points - $found_code_points;

+               $chunk_count = _wp_scan_utf8( $text, $byte_offset, $invalid_length, null, $needed );

+

+               $found_code_points += $chunk_count;

+

+               // Invalid spans only convey one code point count regardless of how long they are.

+               if ( 0 !== $invalid_length && $found_code_points < $max_code_points ) {

+                       ++$found_code_points;

+                       $byte_offset += $invalid_length;

+               }

+       }

+

+       return $byte_offset - $was_at;

+}

+

+/**

</ins><span class="cx" style="display: block; padding: 0 10px">  * Converts a string from ISO-8859-1 to UTF-8, maintaining backwards compatibility

<span class="cx" style="display: block; padding: 0 10px">  * with the deprecated function from the PHP standard library.

</span><span class="cx" style="display: block; padding: 0 10px">  *

</span></span></pre></div>

<a id="trunksrcwpincludescompatphp"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/compat.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/compat.php  2025-10-17 23:52:41 UTC (rev 60968)

+++ trunk/src/wp-includes/compat.php    2025-10-18 04:34:02 UTC (rev 60969)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -33,45 +33,43 @@

</span><span class="cx" style="display: block; padding: 0 10px">  *

</span><span class="cx" style="display: block; padding: 0 10px">  * @ignore

</span><span class="cx" style="display: block; padding: 0 10px">  * @since 4.2.2

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.9.0 Deprecated the `$set` argument.

</ins><span class="cx" style="display: block; padding: 0 10px">  * @access private

</span><span class="cx" style="display: block; padding: 0 10px">  *

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @param bool $set - Used for testing only

- *             null   : default - get PCRE/u capability

- *             false  : Used for testing - return false for future calls to this function

- *             'reset': Used for testing - restore default behavior of this function

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param bool $set Deprecated. This argument is no longer used for testing purposes.

</ins><span class="cx" style="display: block; padding: 0 10px">  */

</span><span class="cx" style="display: block; padding: 0 10px"> function _wp_can_use_pcre_u( $set = null ) {

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        static $utf8_pcre = 'reset';

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ static $utf8_pcre = null;

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( null !== $set ) {

-               $utf8_pcre = $set;

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( isset( $set ) ) {

+               _deprecated_argument( __FUNCTION__, '6.9.0' );

</ins><span class="cx" style="display: block; padding: 0 10px">         }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( 'reset' === $utf8_pcre ) {

-               $utf8_pcre = true;

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( isset( $utf8_pcre ) ) {

+               return $utf8_pcre;

+       }

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                set_error_handler(

-                       function ( $errno, $errstr ) use ( &$utf8_pcre ) {

-                               if ( str_starts_with( $errstr, 'preg_match():' ) ) {

-                                       $utf8_pcre = false;

-                                       return true;

-                               }

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $utf8_pcre = true;

+       set_error_handler(

+               function ( $errno, $errstr ) use ( &$utf8_pcre ) {

+                       if ( str_starts_with( $errstr, 'preg_match():' ) ) {

+                               $utf8_pcre = false;

+                               return true;

+                       }

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                return false;

-                       },

-                       E_WARNING

-               );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 return false;

+               },

+               E_WARNING

+       );

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                /*

-                * Attempt to compile a PCRE pattern with the PCRE_UTF8 flag. For

-                * systems lacking Unicode support this will trigger a warning

-                * during compilation, which the error handler will intercept.

-                */

-               preg_match( '//u', '' );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*

+        * Attempt to compile a PCRE pattern with the PCRE_UTF8 flag. For

+        * systems lacking Unicode support this will trigger a warning

+        * during compilation, which the error handler will intercept.

+        */

+       preg_match( '//u', '' );

+       restore_error_handler();

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                restore_error_handler();

-       }

-

</del><span class="cx" style="display: block; padding: 0 10px">         return $utf8_pcre;

</span><span class="cx" style="display: block; padding: 0 10px"> }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -136,15 +134,15 @@

</span><span class="cx" style="display: block; padding: 0 10px"> /**

<span class="cx" style="display: block; padding: 0 10px">  * Internal compat function to mimic mb_substr().

</span><span class="cx" style="display: block; padding: 0 10px">  *

<del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * Only understands UTF-8 and 8bit. All other character sets will be treated as 8bit.

- * For `$encoding === UTF-8`, the `$str` input is expected to be a valid UTF-8 byte

- * sequence. The behavior of this function for invalid inputs is undefined.

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Only supports UTF-8 and non-shifting single-byte encodings. For all other encodings

+ * expect the substrings to be misaligned. When the given encoding (or the `blog_charset`

+ * if none is provided) isn’t UTF-8 then the function returns the output of {@see \substr()}.

</ins><span class="cx" style="display: block; padding: 0 10px">  *

</span><span class="cx" style="display: block; padding: 0 10px">  * @ignore

</span><span class="cx" style="display: block; padding: 0 10px">  * @since 3.2.0

</span><span class="cx" style="display: block; padding: 0 10px">  *

<span class="cx" style="display: block; padding: 0 10px">  * @param string      $str      The string to extract the substring from.

<del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @param int         $start    Position to being extraction from in `$str`.

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param int         $start    Character offset at which to start the substring extraction.

</ins><span class="cx" style="display: block; padding: 0 10px">  * @param int|null    $length   Optional. Maximum number of characters to extract from `$str`.

<span class="cx" style="display: block; padding: 0 10px">  *                              Default null.

<span class="cx" style="display: block; padding: 0 10px">  * @param string|null $encoding Optional. Character encoding to use. Default null.

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -155,56 +153,39 @@

</span><span class="cx" style="display: block; padding: 0 10px">                return '';

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( null === $encoding ) {

-               $encoding = get_option( 'blog_charset' );

-       }

-

-       /*

-        * The solution below works only for UTF-8, so in case of a different

-        * charset just use built-in substr().

-        */

-       if ( ! _is_utf8_charset( $encoding ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // The solution below works only for UTF-8; treat all other encodings as byte streams.

+       if ( ! _is_utf8_charset( $encoding ?? get_option( 'blog_charset' ) ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px">                 return is_null( $length ) ? substr( $str, $start ) : substr( $str, $start, $length );

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( _wp_can_use_pcre_u() ) {

-               // Use the regex unicode support to separate the UTF-8 characters into an array.

-               preg_match_all( '/./us', $str, $match );

-               $chars = is_null( $length ) ? array_slice( $match[0], $start ) : array_slice( $match[0], $start, $length );

-               return implode( '', $chars );

-       }

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $total_length = ( $start < 0 || $length < 0 )

+               ? _wp_utf8_codepoint_count( $str )

+               : 0;

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        $regex = '/(

-               [\x00-\x7F]                  # single-byte sequences   0xxxxxxx

-               | [\xC2-\xDF][\x80-\xBF]       # double-byte sequences   110xxxxx 10xxxxxx

-               | \xE0[\xA0-\xBF][\x80-\xBF]   # triple-byte sequences   1110xxxx 10xxxxxx * 2

-               | [\xE1-\xEC][\x80-\xBF]{2}

-               | \xED[\x80-\x9F][\x80-\xBF]

-               | [\xEE-\xEF][\x80-\xBF]{2}

-               | \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences   11110xxx 10xxxxxx * 3

-               | [\xF1-\xF3][\x80-\xBF]{3}

-               | \xF4[\x80-\x8F][\x80-\xBF]{2}

-       )/x';

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $normalized_start = $start < 0

+               ? max( 0, $total_length + $start )

+               : $start;

</ins><span class="cx" style="display: block; padding: 0 10px"> 

<del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        // Start with 1 element instead of 0 since the first thing we do is pop.

-       $chars = array( '' );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*

+        * The starting offset is provided as characters, which means this needs to

+        * find how many bytes that many characters occupies at the start of the string.

+        */

+       $starting_byte_offset = _wp_utf8_codepoint_span( $str, 0, $normalized_start );

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        do {

-               // We had some string left over from the last round, but we counted it in that last round.

-               array_pop( $chars );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $normalized_length = $length < 0

+               ? max( 0, $total_length - $normalized_start + $length )

+               : $length;

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                /*

-                * Split by UTF-8 character, limit to 1000 characters (last array element will contain

-                * the rest of the string).

-                */

-               $pieces = preg_split( $regex, $str, 1000, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*

+        * This is the main step. It finds how many bytes the given length of code points

+        * occupies in the input, starting at the byte offset calculated above.

+        */

+       $byte_length = isset( $normalized_length )

+               ? _wp_utf8_codepoint_span( $str, $starting_byte_offset, $normalized_length )

+               : ( strlen( $str ) - $starting_byte_offset );

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $chars = array_merge( $chars, $pieces );

-

-               // If there's anything left over, repeat the loop.

-       } while ( count( $pieces ) > 1 && $str = array_pop( $pieces ) );

-

-       return implode( '', array_slice( $chars, $start, $length ) );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // The result is a normal byte-level substring using the computed ranges.

+       return substr( $str, $starting_byte_offset, $byte_length );

</ins><span class="cx" style="display: block; padding: 0 10px"> }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px"> if ( ! function_exists( 'mb_strlen' ) ) :

</span></span></pre></div>

<a id="trunktestsphpunittestscompatmbSubstrphp"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/compat/mbSubstr.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/compat/mbSubstr.php     2025-10-17 23:52:41 UTC (rev 60968)

+++ trunk/tests/phpunit/tests/compat/mbSubstr.php       2025-10-18 04:34:02 UTC (rev 60969)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -13,88 +13,51 @@

<span class="cx" style="display: block; padding: 0 10px">         * Test that mb_substr() is always available (either from PHP or WP).

</span><span class="cx" style="display: block; padding: 0 10px">         */

</span><span class="cx" style="display: block; padding: 0 10px">        public function test_mb_substr_availability() {

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->assertTrue( function_exists( 'mb_substr' ) );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->assertTrue(

+                       in_array( 'mb_substr', get_defined_functions()['internal'], true ),

+                       'Test runner should have `mbstring` extension active but doesn’t.'

+               );

</ins><span class="cx" style="display: block; padding: 0 10px">         }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">        /**

</span><span class="cx" style="display: block; padding: 0 10px">         * @dataProvider data_utf8_substrings

</span><span class="cx" style="display: block; padding: 0 10px">         */

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        public function test_mb_substr( $input_string, $start, $length, $expected_character_substring ) {

-               $this->assertSame( $expected_character_substring, _mb_substr( $input_string, $start, $length, 'UTF-8' ) );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ public function test_mb_substr( $input_string, $start, $length ) {

+               $this->assertSame(

+                       mb_substr( $input_string, $start, $length, 'UTF-8' ),

+                       _mb_substr( $input_string, $start, $length, 'UTF-8' )

+               );

</ins><span class="cx" style="display: block; padding: 0 10px">         }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">        /**

</span><span class="cx" style="display: block; padding: 0 10px">         * @dataProvider data_utf8_substrings

</span><span class="cx" style="display: block; padding: 0 10px">         */

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        public function test_mb_substr_via_regex( $input_string, $start, $length, $expected_character_substring ) {

-               _wp_can_use_pcre_u( false );

-               $this->assertSame( $expected_character_substring, _mb_substr( $input_string, $start, $length, 'UTF-8' ) );

-               _wp_can_use_pcre_u( 'reset' );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ public function test_8bit_mb_substr( $input_string, $start, $length ) {

+               $this->assertSame(

+                       mb_substr( $input_string, $start, $length, '8bit' ),

+                       _mb_substr( $input_string, $start, $length, '8bit' )

+               );

</ins><span class="cx" style="display: block; padding: 0 10px">         }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">        /**

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-         * @dataProvider data_utf8_substrings

-        */

-       public function test_8bit_mb_substr( $input_string, $start, $length, $expected_character_substring, $expected_byte_substring ) {

-               $this->assertSame( $expected_byte_substring, _mb_substr( $input_string, $start, $length, '8bit' ) );

-       }

-

-       /**

</del><span class="cx" style="display: block; padding: 0 10px">          * Data provider.

</span><span class="cx" style="display: block; padding: 0 10px">         *

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-         * @return array

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+  * @return array[]

</ins><span class="cx" style="display: block; padding: 0 10px">          */

</span><span class="cx" style="display: block; padding: 0 10px">        public function data_utf8_substrings() {

</span><span class="cx" style="display: block; padding: 0 10px">                return array(

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        array(

-                               'input_string'                 => 'баба',

-                               'start'                        => 0,

-                               'length'                       => 3,

-                               'expected_character_substring' => 'баб',

-                               'expected_byte_substring'      => "б\xD0",

-                       ),

-                       array(

-                               'input_string'                 => 'баба',

-                               'start'                        => 0,

-                               'length'                       => -1,

-                               'expected_character_substring' => 'баб',

-                               'expected_byte_substring'      => "баб\xD0",

-                       ),

-                       array(

-                               'input_string'                 => 'баба',

-                               'start'                        => 1,

-                               'length'                       => null,

-                               'expected_character_substring' => 'аба',

-                               'expected_byte_substring'      => "\xB1аба",

-                       ),

-                       array(

-                               'input_string'                 => 'баба',

-                               'start'                        => -3,

-                               'length'                       => null,

-                               'expected_character_substring' => 'аба',

-                               'expected_byte_substring'      => "\xB1а",

-                       ),

-                       array(

-                               'input_string'                 => 'баба',

-                               'start'                        => -3,

-                               'length'                       => 2,

-                               'expected_character_substring' => 'аб',

-                               'expected_byte_substring'      => "\xB1\xD0",

-                       ),

-                       array(

-                               'input_string'                 => 'баба',

-                               'start'                        => -1,

-                               'length'                       => 2,

-                               'expected_character_substring' => 'а',

-                               'expected_byte_substring'      => "\xB0",

-                       ),

-                       array(

-                               'input_string'                 => 'I am your баба',

-                               'start'                        => 0,

-                               'length'                       => 11,

-                               'expected_character_substring' => 'I am your б',

-                               'expected_byte_substring'      => "I am your \xD0",

-                       ),

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 'баба'           => array( 'баба', 0, 3 ),

+                       'баба'           => array( 'баба', 0, -1 ),

+                       'баба'           => array( 'баба', 1, null ),

+                       'баба'           => array( 'баба', -3, null ),

+                       'баба'           => array( 'баба', -3, 2 ),

+                       'баба'           => array( 'баба', -2, 1 ),

+                       'баба'           => array( 'баба', 30, 1 ),

+                       'баба'           => array( 'баба', 15, -30 ),

+                       'баба'           => array( 'баба', -5, -5 ),

+                       'баба'           => array( 'баба', 5, -3 ),

+                       'баба'           => array( 'баба', -3, 5 ),

+                       'I am your баба' => array( 'I am your баба', 0, 11 ),

</ins><span class="cx" style="display: block; padding: 0 10px">                 );

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -103,7 +66,7 @@

</span><span class="cx" style="display: block; padding: 0 10px">         */

</span><span class="cx" style="display: block; padding: 0 10px">        public function test_mb_substr_phpcore_basic() {

</span><span class="cx" style="display: block; padding: 0 10px">                $string_ascii = 'ABCDEF';

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $string_mb    = base64_decode( '5pel5pys6Kqe44OG44Kt44K544OI44Gn44GZ44CCMDEyMzTvvJXvvJbvvJfvvJjvvJnjgII=' );

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $string_mb    = '日本語テキストです。01234５６７８９。';

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame(

</span><span class="cx" style="display: block; padding: 0 10px">                        'DEF',

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -118,13 +81,13 @@

</span><span class="cx" style="display: block; padding: 0 10px"> 

<span class="cx" style="display: block; padding: 0 10px">                // Specific latin-1 as that is the default the core PHP test operates under.

</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame(

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        'peacrOiqng==',

-                       base64_encode( _mb_substr( $string_mb, 2, 7, 'latin-1' ) ),

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 "\xA5本語",

+                       _mb_substr( $string_mb, 2, 7, 'latin-1' ),

</ins><span class="cx" style="display: block; padding: 0 10px">                         'Substring does not match expected for offset 2, length 7, with latin-1 charset'

</span><span class="cx" style="display: block; padding: 0 10px">                );

</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame(

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        '6Kqe44OG44Kt44K544OI44Gn44GZ',

-                       base64_encode( _mb_substr( $string_mb, 2, 7, 'utf-8' ) ),

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 '語テキストです',

+                       _mb_substr( $string_mb, 2, 7, 'utf-8' ),

</ins><span class="cx" style="display: block; padding: 0 10px">                         'Substring does not match expected for offset 2, length 7, with utf-8 charset'

</span><span class="cx" style="display: block; padding: 0 10px">                );

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span></span></pre>

</div>

</div>

</body>

</html>