<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[60969] trunk: Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/60969">60969</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/60969","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2025-10-18 04:34:02 +0000 (Sat, 18 Oct 2025)</dd>
</dl>
<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.
The existing polyfill for `mb_substr()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of characters (1,000 at a time, iterating until complete), and re-joins them at the end.
This patch provides an updated polyfill which will reliably parse UTF-8 strings even in the presence of invalid bytes. It computes boundaries for the substring extraction with zero allocations and then returns a single `substr()` call at the end.
This change improves the reliability of UTF-8 string handling and removes behavioral variability based on the runtime system.
Developed in https://github.com/WordPress/wordpress-develop/pull/9829
Discussed in https://core.trac.wordpress.org/ticket/63863
See <a href="https://core.trac.wordpress.org/ticket/63863">#63863</a>.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludescompatutf8php">trunk/src/wp-includes/compat-utf8.php</a></li>
<li><a href="#trunksrcwpincludescompatphp">trunk/src/wp-includes/compat.php</a></li>
<li><a href="#trunktestsphpunittestscompatmbSubstrphp">trunk/tests/phpunit/tests/compat/mbSubstr.php</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludescompatutf8php"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/compat-utf8.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/compat-utf8.php 2025-10-17 23:52:41 UTC (rev 60968)
+++ trunk/src/wp-includes/compat-utf8.php 2025-10-18 04:34:02 UTC (rev 60969)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -339,6 +339,48 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Given a starting offset within a string and a maximum number of code points,
+ * return how many bytes are occupied by the span of characters.
+ *
+ * Invalid spans of bytes count as a single code point according to the maximal
+ * subpart rule. This function is a fallback method for calling
+ * `strlen( mb_substr( substr( $text, $at ), 0, $max_code_points ) )`.
+ *
+ * @since 6.9.0
+ * @access private
+ *
+ * @param string $text Count bytes of span in this text.
+ * @param int $byte_offset Start counting at this byte offset.
+ * @param int $max_code_points Stop counting after this many code points have been seen,
+ * or at the end of the string.
+ * @param ?int $found_code_points Optional. Will be set to number of found code points in
+ * span, as this might be smaller than the maximum count if
+ * the string is not long enough.
+ * @return int Number of bytes spanned by the code points.
+ */
+function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
+ $was_at = $byte_offset;
+ $invalid_length = 0;
+ $end = strlen( $text );
+ $found_code_points = 0;
+
+ while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
+ $needed = $max_code_points - $found_code_points;
+ $chunk_count = _wp_scan_utf8( $text, $byte_offset, $invalid_length, null, $needed );
+
+ $found_code_points += $chunk_count;
+
+ // Invalid spans only convey one code point count regardless of how long they are.
+ if ( 0 !== $invalid_length && $found_code_points < $max_code_points ) {
+ ++$found_code_points;
+ $byte_offset += $invalid_length;
+ }
+ }
+
+ return $byte_offset - $was_at;
+}
+
+/**
</ins><span class="cx" style="display: block; padding: 0 10px"> * Converts a string from ISO-8859-1 to UTF-8, maintaining backwards compatibility
</span><span class="cx" style="display: block; padding: 0 10px"> * with the deprecated function from the PHP standard library.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span></span></pre></div>
<a id="trunksrcwpincludescompatphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/compat.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/compat.php 2025-10-17 23:52:41 UTC (rev 60968)
+++ trunk/src/wp-includes/compat.php 2025-10-18 04:34:02 UTC (rev 60969)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -33,45 +33,43 @@
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @ignore
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 4.2.2
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.9.0 Deprecated the `$set` argument.
</ins><span class="cx" style="display: block; padding: 0 10px"> * @access private
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @param bool $set - Used for testing only
- * null : default - get PCRE/u capability
- * false : Used for testing - return false for future calls to this function
- * 'reset': Used for testing - restore default behavior of this function
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param bool $set Deprecated. This argument is no longer used for testing purposes.
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> function _wp_can_use_pcre_u( $set = null ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- static $utf8_pcre = 'reset';
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ static $utf8_pcre = null;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null !== $set ) {
- $utf8_pcre = $set;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( isset( $set ) ) {
+ _deprecated_argument( __FUNCTION__, '6.9.0' );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( 'reset' === $utf8_pcre ) {
- $utf8_pcre = true;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( isset( $utf8_pcre ) ) {
+ return $utf8_pcre;
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- set_error_handler(
- function ( $errno, $errstr ) use ( &$utf8_pcre ) {
- if ( str_starts_with( $errstr, 'preg_match():' ) ) {
- $utf8_pcre = false;
- return true;
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $utf8_pcre = true;
+ set_error_handler(
+ function ( $errno, $errstr ) use ( &$utf8_pcre ) {
+ if ( str_starts_with( $errstr, 'preg_match():' ) ) {
+ $utf8_pcre = false;
+ return true;
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- return false;
- },
- E_WARNING
- );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return false;
+ },
+ E_WARNING
+ );
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- /*
- * Attempt to compile a PCRE pattern with the PCRE_UTF8 flag. For
- * systems lacking Unicode support this will trigger a warning
- * during compilation, which the error handler will intercept.
- */
- preg_match( '//u', '' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*
+ * Attempt to compile a PCRE pattern with the PCRE_UTF8 flag. For
+ * systems lacking Unicode support this will trigger a warning
+ * during compilation, which the error handler will intercept.
+ */
+ preg_match( '//u', '' );
+ restore_error_handler();
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- restore_error_handler();
- }
-
</del><span class="cx" style="display: block; padding: 0 10px"> return $utf8_pcre;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -136,15 +134,15 @@
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="cx" style="display: block; padding: 0 10px"> * Internal compat function to mimic mb_substr().
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * Only understands UTF-8 and 8bit. All other character sets will be treated as 8bit.
- * For `$encoding === UTF-8`, the `$str` input is expected to be a valid UTF-8 byte
- * sequence. The behavior of this function for invalid inputs is undefined.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Only supports UTF-8 and non-shifting single-byte encodings. For all other encodings
+ * expect the substrings to be misaligned. When the given encoding (or the `blog_charset`
+ * if none is provided) isn’t UTF-8 then the function returns the output of {@see \substr()}.
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @ignore
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 3.2.0
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $str The string to extract the substring from.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @param int $start Position to being extraction from in `$str`.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param int $start Character offset at which to start the substring extraction.
</ins><span class="cx" style="display: block; padding: 0 10px"> * @param int|null $length Optional. Maximum number of characters to extract from `$str`.
</span><span class="cx" style="display: block; padding: 0 10px"> * Default null.
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string|null $encoding Optional. Character encoding to use. Default null.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -155,56 +153,39 @@
</span><span class="cx" style="display: block; padding: 0 10px"> return '';
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null === $encoding ) {
- $encoding = get_option( 'blog_charset' );
- }
-
- /*
- * The solution below works only for UTF-8, so in case of a different
- * charset just use built-in substr().
- */
- if ( ! _is_utf8_charset( $encoding ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // The solution below works only for UTF-8; treat all other encodings as byte streams.
+ if ( ! _is_utf8_charset( $encoding ?? get_option( 'blog_charset' ) ) ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return is_null( $length ) ? substr( $str, $start ) : substr( $str, $start, $length );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( _wp_can_use_pcre_u() ) {
- // Use the regex unicode support to separate the UTF-8 characters into an array.
- preg_match_all( '/./us', $str, $match );
- $chars = is_null( $length ) ? array_slice( $match[0], $start ) : array_slice( $match[0], $start, $length );
- return implode( '', $chars );
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $total_length = ( $start < 0 || $length < 0 )
+ ? _wp_utf8_codepoint_count( $str )
+ : 0;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $regex = '/(
- [\x00-\x7F] # single-byte sequences 0xxxxxxx
- | [\xC2-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx
- | \xE0[\xA0-\xBF][\x80-\xBF] # triple-byte sequences 1110xxxx 10xxxxxx * 2
- | [\xE1-\xEC][\x80-\xBF]{2}
- | \xED[\x80-\x9F][\x80-\xBF]
- | [\xEE-\xEF][\x80-\xBF]{2}
- | \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences 11110xxx 10xxxxxx * 3
- | [\xF1-\xF3][\x80-\xBF]{3}
- | \xF4[\x80-\x8F][\x80-\xBF]{2}
- )/x';
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $normalized_start = $start < 0
+ ? max( 0, $total_length + $start )
+ : $start;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- // Start with 1 element instead of 0 since the first thing we do is pop.
- $chars = array( '' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*
+ * The starting offset is provided as characters, which means this needs to
+ * find how many bytes that many characters occupies at the start of the string.
+ */
+ $starting_byte_offset = _wp_utf8_codepoint_span( $str, 0, $normalized_start );
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- do {
- // We had some string left over from the last round, but we counted it in that last round.
- array_pop( $chars );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $normalized_length = $length < 0
+ ? max( 0, $total_length - $normalized_start + $length )
+ : $length;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- /*
- * Split by UTF-8 character, limit to 1000 characters (last array element will contain
- * the rest of the string).
- */
- $pieces = preg_split( $regex, $str, 1000, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*
+ * This is the main step. It finds how many bytes the given length of code points
+ * occupies in the input, starting at the byte offset calculated above.
+ */
+ $byte_length = isset( $normalized_length )
+ ? _wp_utf8_codepoint_span( $str, $starting_byte_offset, $normalized_length )
+ : ( strlen( $str ) - $starting_byte_offset );
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $chars = array_merge( $chars, $pieces );
-
- // If there's anything left over, repeat the loop.
- } while ( count( $pieces ) > 1 && $str = array_pop( $pieces ) );
-
- return implode( '', array_slice( $chars, $start, $length ) );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // The result is a normal byte-level substring using the computed ranges.
+ return substr( $str, $starting_byte_offset, $byte_length );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> if ( ! function_exists( 'mb_strlen' ) ) :
</span></span></pre></div>
<a id="trunktestsphpunittestscompatmbSubstrphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/compat/mbSubstr.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/compat/mbSubstr.php 2025-10-17 23:52:41 UTC (rev 60968)
+++ trunk/tests/phpunit/tests/compat/mbSubstr.php 2025-10-18 04:34:02 UTC (rev 60969)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -13,88 +13,51 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * Test that mb_substr() is always available (either from PHP or WP).
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function test_mb_substr_availability() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->assertTrue( function_exists( 'mb_substr' ) );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->assertTrue(
+ in_array( 'mb_substr', get_defined_functions()['internal'], true ),
+ 'Test runner should have `mbstring` extension active but doesn’t.'
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="cx" style="display: block; padding: 0 10px"> * @dataProvider data_utf8_substrings
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- public function test_mb_substr( $input_string, $start, $length, $expected_character_substring ) {
- $this->assertSame( $expected_character_substring, _mb_substr( $input_string, $start, $length, 'UTF-8' ) );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ public function test_mb_substr( $input_string, $start, $length ) {
+ $this->assertSame(
+ mb_substr( $input_string, $start, $length, 'UTF-8' ),
+ _mb_substr( $input_string, $start, $length, 'UTF-8' )
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="cx" style="display: block; padding: 0 10px"> * @dataProvider data_utf8_substrings
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- public function test_mb_substr_via_regex( $input_string, $start, $length, $expected_character_substring ) {
- _wp_can_use_pcre_u( false );
- $this->assertSame( $expected_character_substring, _mb_substr( $input_string, $start, $length, 'UTF-8' ) );
- _wp_can_use_pcre_u( 'reset' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ public function test_8bit_mb_substr( $input_string, $start, $length ) {
+ $this->assertSame(
+ mb_substr( $input_string, $start, $length, '8bit' ),
+ _mb_substr( $input_string, $start, $length, '8bit' )
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @dataProvider data_utf8_substrings
- */
- public function test_8bit_mb_substr( $input_string, $start, $length, $expected_character_substring, $expected_byte_substring ) {
- $this->assertSame( $expected_byte_substring, _mb_substr( $input_string, $start, $length, '8bit' ) );
- }
-
- /**
</del><span class="cx" style="display: block; padding: 0 10px"> * Data provider.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @return array
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @return array[]
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function data_utf8_substrings() {
</span><span class="cx" style="display: block; padding: 0 10px"> return array(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- array(
- 'input_string' => 'баба',
- 'start' => 0,
- 'length' => 3,
- 'expected_character_substring' => 'баб',
- 'expected_byte_substring' => "б\xD0",
- ),
- array(
- 'input_string' => 'баба',
- 'start' => 0,
- 'length' => -1,
- 'expected_character_substring' => 'баб',
- 'expected_byte_substring' => "баб\xD0",
- ),
- array(
- 'input_string' => 'баба',
- 'start' => 1,
- 'length' => null,
- 'expected_character_substring' => 'аба',
- 'expected_byte_substring' => "\xB1аба",
- ),
- array(
- 'input_string' => 'баба',
- 'start' => -3,
- 'length' => null,
- 'expected_character_substring' => 'аба',
- 'expected_byte_substring' => "\xB1а",
- ),
- array(
- 'input_string' => 'баба',
- 'start' => -3,
- 'length' => 2,
- 'expected_character_substring' => 'аб',
- 'expected_byte_substring' => "\xB1\xD0",
- ),
- array(
- 'input_string' => 'баба',
- 'start' => -1,
- 'length' => 2,
- 'expected_character_substring' => 'а',
- 'expected_byte_substring' => "\xB0",
- ),
- array(
- 'input_string' => 'I am your баба',
- 'start' => 0,
- 'length' => 11,
- 'expected_character_substring' => 'I am your б',
- 'expected_byte_substring' => "I am your \xD0",
- ),
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'баба' => array( 'баба', 0, 3 ),
+ 'баба' => array( 'баба', 0, -1 ),
+ 'баба' => array( 'баба', 1, null ),
+ 'баба' => array( 'баба', -3, null ),
+ 'баба' => array( 'баба', -3, 2 ),
+ 'баба' => array( 'баба', -2, 1 ),
+ 'баба' => array( 'баба', 30, 1 ),
+ 'баба' => array( 'баба', 15, -30 ),
+ 'баба' => array( 'баба', -5, -5 ),
+ 'баба' => array( 'баба', 5, -3 ),
+ 'баба' => array( 'баба', -3, 5 ),
+ 'I am your баба' => array( 'I am your баба', 0, 11 ),
</ins><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -103,7 +66,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function test_mb_substr_phpcore_basic() {
</span><span class="cx" style="display: block; padding: 0 10px"> $string_ascii = 'ABCDEF';
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $string_mb = base64_decode( '5pel5pys6Kqe44OG44Kt44K544OI44Gn44GZ44CCMDEyMzTvvJXvvJbvvJfvvJjvvJnjgII=' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $string_mb = '日本語テキストです。0123456789。';
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $this->assertSame(
</span><span class="cx" style="display: block; padding: 0 10px"> 'DEF',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -118,13 +81,13 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> // Specific latin-1 as that is the default the core PHP test operates under.
</span><span class="cx" style="display: block; padding: 0 10px"> $this->assertSame(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'peacrOiqng==',
- base64_encode( _mb_substr( $string_mb, 2, 7, 'latin-1' ) ),
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ "\xA5本語",
+ _mb_substr( $string_mb, 2, 7, 'latin-1' ),
</ins><span class="cx" style="display: block; padding: 0 10px"> 'Substring does not match expected for offset 2, length 7, with latin-1 charset'
</span><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px"> $this->assertSame(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- '6Kqe44OG44Kt44K544OI44Gn44GZ',
- base64_encode( _mb_substr( $string_mb, 2, 7, 'utf-8' ) ),
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ '語テキストです',
+ _mb_substr( $string_mb, 2, 7, 'utf-8' ),
</ins><span class="cx" style="display: block; padding: 0 10px"> 'Substring does not match expected for offset 2, length 7, with utf-8 charset'
</span><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre>
</div>
</div>
</body>
</html>