<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"

"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />

<title>[60630] trunk: Add `wp_is_valid_utf8()` for normalizing UTF-8 checks.</title>

</head>

<body>

<style type="text/css"><!--

#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }

#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }

#msg dt:after { content:':';}

#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }

#msg dl a { font-weight: bold}

#msg dl a:link    { color:#fc3; }

#msg dl a:active  { color:#ff0; }

#msg dl a:visited { color:#cc6; }

h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }

#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }

#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }

#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }

#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }

#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }

#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }

#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }

#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }

#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }

#logmsg pre { background: #eee; padding: 1em; }

#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}

#logmsg dl { margin: 0; }

#logmsg dt { font-weight: bold; }

#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }

#logmsg dd:before { content:'\00bb';}

#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }

#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }

#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }

#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }

#logmsg table th.Corner { text-align: left; }

#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }

#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }

#patch { width: 100%; }

#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}

#patch .propset h4, #patch .binary h4 {margin:0;}

#patch pre {padding:0;line-height:1.2em;margin:0;}

#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}

#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}

#patch span {display:block;padding:0 10px;}

#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}

#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}

#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}

#patch .lines, .info {color:#888;background:#fff;}

--></style>

<div id="msg">

<dl class="meta" style="font-size: 105%">

<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/60630">60630</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/60630","name":"Review Commit"}}</script></dd>

<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>

<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2025-08-12 18:13:48 +0000 (Tue, 12 Aug 2025)</dd>

</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>Add `wp_is_valid_utf8()` for normalizing UTF-8 checks.

There are several existing mechanisms in Core to determine if a given string contains valid UTF-8 bytes or not. These are spread out and depend on which extensions are installed on the running system and what is set for `blog_charset`. The `seems_utf8()` function is one of these mechanisms.

`seems_utf8()` does not properly validate UTF-8, unfortunately, and is slow, and the purpose of the function is veiled behind its name and historic legacy.

This patch deprecates `seems_utf()` and introduces `wp_is_valid_utf8()`; a new, spec-compliant, efficient, and focused UTF-8 validator. This new validator defers to `mb_check_encoding()` where present, otherwise validating with a pure-PHP implementation. This makes the spec-compliant validator available on all systems regardless of their runtime environment.

Developed in https://github.com/WordPress/wordpress-develop/pull/9317

Discussed in https://core.trac.wordpress.org/ticket/38044

Props dmsnell, jonsurrell, jorbin.

Fixes <a href="https://core.trac.wordpress.org/ticket/38044">#38044</a>.</pre>

<h3>Modified Paths</h3>

<ul>

<li><a href="#trunksrcwpadminincludesexportphp">trunk/src/wp-admin/includes/export.php</a></li>

<li><a href="#trunksrcwpadminincludesimagephp">trunk/src/wp-admin/includes/image.php</a></li>

<li><a href="#trunksrcwpincludesformattingphp">trunk/src/wp-includes/formatting.php</a></li>

<li><a href="#trunktestsphpunittestsformattingseemsUtf8php">trunk/tests/phpunit/tests/formatting/seemsUtf8.php</a></li>

</ul>

<h3>Added Paths</h3>

<ul>

<li>trunk/tests/phpunit/data/unicode/</li>

<li>trunk/tests/phpunit/data/unicode/utf8tests/</li>

<li><a href="#trunktestsphpunitdataunicodeutf8testsLICENSE">trunk/tests/phpunit/data/unicode/utf8tests/LICENSE</a></li>

<li><a href="#trunktestsphpunitdataunicodeutf8testsREADMEmd">trunk/tests/phpunit/data/unicode/utf8tests/README.md</a></li>

<li><a href="#trunktestsphpunitdataunicodeutf8testsutf8teststxt">trunk/tests/phpunit/data/unicode/utf8tests/utf8tests.txt</a></li>

<li>trunk/tests/phpunit/tests/unicode/</li>

<li><a href="#trunktestsphpunittestsunicodewpIsValidUtf8php">trunk/tests/phpunit/tests/unicode/wpIsValidUtf8.php</a></li>

</ul>

</div>

<div id="patch">

<h3>Diff</h3>

<a id="trunksrcwpadminincludesexportphp"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-admin/includes/export.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-admin/includes/export.php    2025-08-12 14:45:30 UTC (rev 60629)

+++ trunk/src/wp-admin/includes/export.php      2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -243,7 +243,7 @@

</span><span class="cx" style="display: block; padding: 0 10px">         * @return string

</span><span class="cx" style="display: block; padding: 0 10px">         */

</span><span class="cx" style="display: block; padding: 0 10px">        function wxr_cdata( $str ) {

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( ! seems_utf8( $str ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( ! wp_is_valid_utf8( $str ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px">                         $str = utf8_encode( $str );

</span><span class="cx" style="display: block; padding: 0 10px">                }

</span><span class="cx" style="display: block; padding: 0 10px">                // $str = ent2ncr(esc_html($str));

</span></span></pre></div>

<a id="trunksrcwpadminincludesimagephp"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-admin/includes/image.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-admin/includes/image.php     2025-08-12 14:45:30 UTC (rev 60629)

+++ trunk/src/wp-admin/includes/image.php       2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1039,13 +1039,13 @@

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">        foreach ( array( 'title', 'caption', 'credit', 'copyright', 'camera', 'iso' ) as $key ) {

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( $meta[ $key ] && ! seems_utf8( $meta[ $key ] ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( $meta[ $key ] && ! wp_is_valid_utf8( $meta[ $key ] ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px">                         $meta[ $key ] = utf8_encode( $meta[ $key ] );

</span><span class="cx" style="display: block; padding: 0 10px">                }

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">        foreach ( $meta['keywords'] as $key => $keyword ) {

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( ! seems_utf8( $keyword ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( ! wp_is_valid_utf8( $keyword ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px">                         $meta['keywords'][ $key ] = utf8_encode( $keyword );

</span><span class="cx" style="display: block; padding: 0 10px">                }

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span></span></pre></div>

<a id="trunksrcwpincludesformattingphp"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/formatting.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/formatting.php      2025-08-12 14:45:30 UTC (rev 60629)

+++ trunk/src/wp-includes/formatting.php        2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -876,11 +876,14 @@

</span><span class="cx" style="display: block; padding: 0 10px">  *

</span><span class="cx" style="display: block; padding: 0 10px">  * @author bmorel at ssi dot fr (modified)

</span><span class="cx" style="display: block; padding: 0 10px">  * @since 1.2.1

<ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @deprecated 6.9.0 Use {@see wp_is_valid_utf8()} instead.

</ins><span class="cx" style="display: block; padding: 0 10px">  *

<span class="cx" style="display: block; padding: 0 10px">  * @param string $str The string to be checked.

<span class="cx" style="display: block; padding: 0 10px">  * @return bool True if $str fits a UTF-8 model, false otherwise.

</span><span class="cx" style="display: block; padding: 0 10px">  */

</span><span class="cx" style="display: block; padding: 0 10px"> function seems_utf8( $str ) {

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+        _deprecated_function( __FUNCTION__, '6.9.0', 'wp_is_valid_utf8()' );

+

</ins><span class="cx" style="display: block; padding: 0 10px">         mbstring_binary_safe_encoding();

</span><span class="cx" style="display: block; padding: 0 10px">        $length = strlen( $str );

</span><span class="cx" style="display: block; padding: 0 10px">        reset_mbstring_encoding();

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -915,6 +918,177 @@

</span><span class="cx" style="display: block; padding: 0 10px"> }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px"> /**

<ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Determines if a given byte string represents a valid UTF-8 encoding.

+ *

+ * Note that it’s unlikely for non-UTF-8 data to validate as UTF-8, but

+ * it is still possible. Many texts are simultaneously valid UTF-8,

+ * valid US-ASCII, and valid ISO-8859-1 (`latin1`).

+ *

+ * Example:

+ *

+ *     true === wp_is_valid_utf8( '' );

+ *     true === wp_is_valid_utf8( 'just a test' );

+ *     true === wp_is_valid_utf8( "\xE2\x9C\x8F" );    // Pencil, U+270F.

+ *     true === wp_is_valid_utf8( "\u{270F}" );        // Pencil, U+270F.

+ *     true === wp_is_valid_utf8( '✏' );              // Pencil, U+270F.

+ *

+ *     false === wp_is_valid_utf8( "just \xC0 test" ); // Invalid bytes.

+ *     false === wp_is_valid_utf8( "\xE2\x9C" );       // Invalid/incomplete sequences.

+ *     false === wp_is_valid_utf8( "\xC1\xBF" );       // Overlong sequences.

+ *     false === wp_is_valid_utf8( "\xED\xB0\x80" );   // Surrogate halves.

+ *     false === wp_is_valid_utf8( "B\xFCch" );        // ISO-8859-1 high-bytes.

+ *                                                     // E.g. The “ü” in ISO-8859-1 is a single byte 0xFC,

+ *                                                     // but in UTF-8 is the two-byte sequence 0xC3 0xBC.

+ *

+ * @see _wp_is_valid_utf8_fallback

+ *

+ * @since 6.9.0

+ *

+ * @param string $bytes String which might contain text encoded as UTF-8.

+ * @return bool Whether the provided bytes can decode as valid UTF-8.

+ */

+function wp_is_valid_utf8( string $bytes ): bool {

+       /*

+        * Since PHP 8.3.0 the UTF-8 validity is cached internally

+        * on string objects, making this a direct property lookup.

+        *

+        * This is to be preferred exclusively once PHP 8.3.0 is

+        * the minimum supported version, because even when the

+        * status isn’t cached, it uses highly-optimized code to

+        * validate the byte stream.

+        */

+       return function_exists( 'mb_check_encoding' )

+               ? mb_check_encoding( $bytes, 'UTF-8' )

+               : _wp_is_valid_utf8_fallback( $bytes );

+}

+

+/**

+ * Fallback mechanism for safely validating UTF-8 bytes.

+ *

+ * By implementing a raw method here the code will behave in the same way on

+ * all installed systems, regardless of what extensions are installed.

+ *

+ * @see wp_is_valid_utf8

+ *

+ * @since 6.9.0

+ * @access private

+ *

+ * @param string $bytes String which might contain text encoded as UTF-8.

+ * @return bool Whether the provided bytes can decode as valid UTF-8.

+ */

+function _wp_is_valid_utf8_fallback( string $bytes ): bool {

+       $end = strlen( $bytes );

+

+       for ( $i = 0; $i < $end; $i++ ) {

+               /*

+                * Quickly skip past US-ASCII bytes, all of which are valid UTF-8.

+                *

+                * This optimization step improves the speed from 10x to 100x

+                * depending on whether the JIT has optimized the function.

+                */

+               $i += strspn(

+                       $bytes,

+                       "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" .

+                       "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" .

+                       " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f",

+                       $i

+               );

+               if ( $i >= $end ) {

+                       break;

+               }

+

+               /**

+                * The above fast-track handled all single-byte UTF-8 characters. What

+                * follows MUST be a multibyte sequence otherwise there’s invalid UTF-8.

+                *

+                * Therefore everything past here is checking those multibyte sequences.

+                * Because it’s possible that there are truncated characters, the use of

+                * the null-coalescing operator with "\xC0" is a convenience for skipping

+                * length checks on every continuation bytes. This works because 0xC0 is

+                * always invalid in a UTF-8 string, meaning that if the string has been

+                * truncated, it will find 0xC0 and reject as invalid UTF-8.

+                *

+                *  > [The following table] lists all of the byte sequences that are well-formed

+                * > in UTF-8. A range of byte values such as A0..BF indicates that any byte

+                * > from A0 to BF (inclusive) is well-formed in that position. Any byte value

+                * > outside of the ranges listed is ill-formed.

+                *

+                * > Table 3-7. Well-Formed UTF-8 Byte Sequences

+                *  ╭─────────────────────┬────────────┬──────────────┬─────────────┬──────────────╮

+                *  │ Code Points         │ First Byte │ Second Byte  │ Third Byte  │ Fourth Byte  │

+                *  ├─────────────────────┼────────────┼──────────────┼─────────────┼──────────────┤

+                *  │ U+0000..U+007F      │ 00..7F     │              │             │              │

+                *  │ U+0080..U+07FF      │ C2..DF     │ 80..BF       │             │              │

+                *  │ U+0800..U+0FFF      │ E0         │ A0..BF       │ 80..BF      │              │

+                *  │ U+1000..U+CFFF      │ E1..EC     │ 80..BF       │ 80..BF      │              │

+                *  │ U+D000..U+D7FF      │ ED         │ 80..9F       │ 80..BF      │              │

+                *  │ U+E000..U+FFFF      │ EE..EF     │ 80..BF       │ 80..BF      │              │

+                *  │ U+10000..U+3FFFF    │ F0         │ 90..BF       │ 80..BF      │ 80..BF       │

+                *  │ U+40000..U+FFFFF    │ F1..F3     │ 80..BF       │ 80..BF      │ 80..BF       │

+                *  │ U+100000..U+10FFFF  │ F4         │ 80..8F       │ 80..BF      │ 80..BF       │

+                *  ╰─────────────────────┴────────────┴──────────────┴─────────────┴──────────────╯

+                *

+                * Notice that all valid third and forth bytes are in the range 80..BF. This

+                * validator takes advantage of that to only check the range of those bytes once.

+                *

+                * @see https://lemire.me/blog/2018/05/09/how-quickly-can-you-check-that-a-string-is-valid-unicode-utf-8/

+                * @see https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G27506

+                */

+

+               $b1 = ord( $bytes[ $i ] );

+               $b2 = ord( $bytes[ $i + 1 ] ?? "\xC0" );

+

+               // Valid two-byte code points.

+

+               if ( $b1 >= 0xC2 && $b1 <= 0xDF && $b2 >= 0x80 && $b2 <= 0xBF ) {

+                       $i++;

+                       continue;

+               }

+

+               $b3 = ord( $bytes[ $i + 2 ] ?? "\xC0" );

+

+               // Valid three-byte code points.

+

+               if ( $b3 < 0x80 || $b3 > 0xBF ) {

+                       return false;

+               }

+

+               if (

+                       ( 0xE0 === $b1 && $b2 >= 0xA0 && $b2 <= 0xBF ) ||

+                       ( $b1 >= 0xE1 && $b1 <= 0xEC && $b2 >= 0x80 && $b2 <= 0xBF ) ||

+                       ( 0xED === $b1 && $b2 >= 0x80 && $b2 <= 0x9F ) ||

+                       ( $b1 >= 0xEE && $b1 <= 0xEF && $b2 >= 0x80 && $b2 <= 0xBF )

+               ) {

+                       $i += 2;

+                       continue;

+               }

+

+               $b4 = ord( $bytes[ $i + 3 ] ?? "\xC0" );

+

+               // Valid four-byte code points.

+

+               if ( $b4 < 0x80 || $b4 > 0xBF ) {

+                       return false;

+               }

+

+               if (

+                       ( 0xF0 === $b1 && $b2 >= 0x90 && $b2 <= 0xBF ) ||

+                       ( $b1 >= 0xF1 && $b1 <= 0xF3 && $b2 >= 0x80 && $b2 <= 0xBF ) ||

+                       ( 0xF4 === $b1 && $b2 >= 0x80 && $b2 <= 0x8F )

+               ) {

+                       $i += 3;

+                       continue;

+               }

+

+               // Any other sequence is invalid.

+               return false;

+       }

+

+       // Reaching the end implies validating every byte.

+       return true;

+}

+

+/**

</ins><span class="cx" style="display: block; padding: 0 10px">  * Converts a number of special characters into their HTML entities.

</span><span class="cx" style="display: block; padding: 0 10px">  *

</span><span class="cx" style="display: block; padding: 0 10px">  * Specifically deals with: `&`, `<`, `>`, `"`, and `'`.

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1597,7 +1771,7 @@

</span><span class="cx" style="display: block; padding: 0 10px">                return $text;

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( seems_utf8( $text ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( wp_is_valid_utf8( $text ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px"> 

</span><span class="cx" style="display: block; padding: 0 10px">                /*

</span><span class="cx" style="display: block; padding: 0 10px">                 * Unicode sequence normalization from NFD (Normalization Form Decomposed)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2028,7 +2202,7 @@

</span><span class="cx" style="display: block; padding: 0 10px">                $utf8_pcre = @preg_match( '/^./u', 'a' );

</span><span class="cx" style="display: block; padding: 0 10px">        }

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( ! seems_utf8( $filename ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( ! wp_is_valid_utf8( $filename ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px">                 $_ext     = pathinfo( $filename, PATHINFO_EXTENSION );

</span><span class="cx" style="display: block; padding: 0 10px">                $_name    = pathinfo( $filename, PATHINFO_FILENAME );

</span><span class="cx" style="display: block; padding: 0 10px">                $filename = sanitize_title_with_dashes( $_name ) . '.' . $_ext;

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2277,7 +2451,7 @@

<span class="cx" style="display: block; padding: 0 10px">        // Restore octets.

</span><span class="cx" style="display: block; padding: 0 10px">        $title = preg_replace( '|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title );

</span><span class="cx" style="display: block; padding: 0 10px"> 

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        if ( seems_utf8( $title ) ) {

</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( wp_is_valid_utf8( $title ) ) {

</ins><span class="cx" style="display: block; padding: 0 10px">                 if ( function_exists( 'mb_strtolower' ) ) {

</span><span class="cx" style="display: block; padding: 0 10px">                        $title = mb_strtolower( $title, 'UTF-8' );

</span><span class="cx" style="display: block; padding: 0 10px">                }

</span></span></pre></div>

<a id="trunktestsphpunitdataunicodeutf8testsLICENSE"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: trunk/tests/phpunit/data/unicode/utf8tests/LICENSE</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/data/unicode/utf8tests/LICENSE                                (rev 0)

+++ trunk/tests/phpunit/data/unicode/utf8tests/LICENSE  2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -0,0 +1,21 @@

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+MIT License

+

+Copyright (c) 2021 flenniken

+

+Permission is hereby granted, free of charge, to any person obtaining a copy

+of this software and associated documentation files (the "Software"), to deal

+in the Software without restriction, including without limitation the rights

+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

+copies of the Software, and to permit persons to whom the Software is

+furnished to do so, subject to the following conditions:

+

+The above copyright notice and this permission notice shall be included in all

+copies or substantial portions of the Software.

+

+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

+SOFTWARE.

</ins></span></pre></div>

<a id="trunktestsphpunitdataunicodeutf8testsREADMEmd"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: trunk/tests/phpunit/data/unicode/utf8tests/README.md</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/data/unicode/utf8tests/README.md                              (rev 0)

+++ trunk/tests/phpunit/data/unicode/utf8tests/README.md        2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -0,0 +1,23 @@

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+# utf8tests

+

+This directory contains a third-party test suite used for testing UTF-8 functionality.

+It primarily provides a set of tests containing various valid and invalid UTF-8 byte sequences.

+

+`utf8tests` can be found on GitHub at [flenniken/utf8tests](https://github.com/flenniken/utf8tests/).

+

+The necessary files have been copied to this directory:

+

+- `LICENSE`

+- `utf8tests.txt`

+

+The version of these files was taken from the git commit with

+SHA [`52cbdf830f3603047036070b086a1e5196df94d1`](https://github.com/flenniken/utf8tests/blob/52cbdf830f3603047036070b086a1e5196df94d1).

+

+## Updating

+

+If there have been changes to the `utf8tests` repository, this test suite can be updated. In

+order to update:

+

+1. Check out the latest version of git repository mentioned above.

+1. Copy the files listed above into this directory.

+1. Update the SHA mentioned in this README file with the new `utf8tests` SHA.

</ins><span class="cx" style="display: block; padding: 0 10px">Property changes on: trunk/tests/phpunit/data/unicode/utf8tests/README.md

</span><span class="cx" style="display: block; padding: 0 10px">___________________________________________________________________

</span></span></pre></div>

<a id="svneolstyle"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: svn:eol-style</h4></div>

<ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+native

</ins><span class="cx" style="display: block; padding: 0 10px">\ No newline at end of property

</span><a id="trunktestsphpunitdataunicodeutf8testsutf8teststxt"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: trunk/tests/phpunit/data/unicode/utf8tests/utf8tests.txt</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/data/unicode/utf8tests/utf8tests.txt                          (rev 0)

+++ trunk/tests/phpunit/data/unicode/utf8tests/utf8tests.txt    2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -0,0 +1,841 @@

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+# == UTF-8 Test Cases ==

+

+# Updated on Sun Jan 16 18:52:08 UTC 2022

+

+# This file, utf8tests.txt, contains test cases for testing UTF-8

+# decoders and validators.  You "compile" the file to generate

+# the file utf8tests.bin and utf8tests.html used for testing.

+

+

+

+# == About the File Format ==

+

+# The compiler skips blank lines or lines that start with a #.

+

+# The other lines test valid or invalid UTF-8 byte sequences.  A

+# test line starts with a unique string. Tests lines may be added

+# moved or removed but never changed.

+

+# This file is an ASCII file without control characters.

+

+# * The replacment character is U+FFFD <EFBFBD>.

+# * nothing = ""

+

+# Line types in this file:

+

+# # <comment line>

+# <blank line>

+# num:valid:ASCII bytes

+# num:valid hex:hexString

+# num:invalid hex:hexString:hexString2:hexString3

+

+# * hexString2 is the expected value when skipping invalid bytes.

+# * hexString3 is the expected value when replacing invalid bytes with U+FFFD.

+

+

+

+# == utf8tests.bin ==

+#

+# The utf8tests.bin is a binary file. Line types in the utf8tests.bin file:

+#

+# num:valid:bytes

+# num:invalid:bytes

+

+

+

+# == Test Cases ==

+

+# This file groups the tests into these categories:

+

+# * Valid Characters

+# * Too Big Characters

+# * Overlong Characters

+# * Surrogate Characters

+# * Valid Noncharacters

+# * Miscellaneous Byte Sequences

+# * Visual Tests

+# * Null Characters

+

+

+

+# === Valid Characters ===

+

+1.0.1:valid hex:31

+1.1.0:valid:abc

+

+# Two byte character.

+2.1.0:valid hex:C2 A9

+

+# Three byte character.

+# U+2010, HYPHEN

+3.0:valid hex:E2 80 90

+

+# Four byte character.

+# U+1D49C

+4.0:valid hex:F0 9D 92 9C

+

+# first two byte sequence

+# U+00000080, <c2 80>

+5.1:valid hex:c2 80

+

+# first three byte sequence

+# U+00000800, <e0 a0 80>

+5.2:valid hex:e0 a0 80

+

+# first four byte sequence

+# U+00010000, <f0 90 80 80>

+5.3:valid hex:f0 90 80 80

+

+# U+0080, <c2 80>

+7.1:valid hex:c2 80

+7.2:valid hex:c2 81

+7.3:valid hex:c2 82

+

+# last one byte character U+0000007F, DELETE

+8.0:valid hex:7F

+

+# last two byte character U+000007FF, <DF BF>

+8.1:valid hex:DF BF

+

+# 1110xxxx 10xxxxxx 10xxxxxx

+# 11101111 10111111 10111111

+# EF       BF       BF

+# U+0000FFFF, <EF BF BF>

+8.2:valid hex:EF BF BF

+

+# U+0010FFFF, <F4 8F BF BF>

+8.3:valid hex:F4 8F BF BF

+

+# U+E000 <EE 80 80>

+10.1:valid hex:EE 80 80

+

+# U+FFFD, replacement character, <EFBFBD>

+10.2:valid hex:EFBFBD

+

+# U+10FFFF, biggest code point, <F4 8F BF BF>

+10.3:valid hex:F4 8F BF BF

+

+# U+002f, SOLIDUS

+22.0:valid:/

+

+# U+002f, SOLIDUS <2f>

+22.1:valid hex:2F

+

+# U+0800, <e0 a0 80>

+22.7:valid hex:e0 a0 80

+

+

+

+# === Too Big Characters ===

+

+# too big U+001FFFFF, <F7 BF BF BF>

+6.0:invalid hex:F7 BF BF BF:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# U+10FFFF is the biggest. 110000, <F4 90 80 80>

+6.0.1:invalid hex:F4 90 80 80:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# too big U+00200000, <f8 88 80 80 80>

+6.1:invalid hex:f8 88 80 80 80:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# too big U+03FFFFFF, <F7 BF BF BF BF>

+6.2:invalid hex:F7 BF BF BF BF:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# too big U+04000000, <fc 84 80 80 80 80>

+6.3:invalid hex:fc 84 80 80 80 80:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# too big U+7FFFFFFF, <F7 BF BF BF BF BF>

+6.4:invalid hex:F7 BF BF BF BF BF:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# too big, <F7 BF BF BF BF BF BF>

+6.5:invalid hex:F7 BF BF BF BF BF BF:nothing:EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD  EFBFBD

+

+# <F7 BF BF>

+9.0:invalid hex:F7 BF BF:nothing:EFBFBD  EFBFBD  EFBFBD

+

+# 2.3  Other boundary conditions

+

+

+

+# 3  Malformed sequences

+

+# 3.1  Unexpected continuation bytes

+

+# first continuation byte <80>

+11.0:invalid hex:80:nothing:EFBFBD

+

+# last continuation byte <bf>

+11.1:invalid hex:bf:nothing:EFBFBD

+

+# <80 bf>

+

+11.2:invalid hex:80 bf:nothing:EFBFBD EFBFBD

+

+# <80 bf 80>

+11.3:invalid hex:80 bf 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# <80 bf 80 bf>

+11.4:invalid hex:80 bf 80 bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <80 bf 80 bf 80>

+11.5:invalid hex:80 bf 80 bf 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <80 bf 80 bf 80 bf>

+11.6:invalid hex:80 bf 80 bf 80 bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 3.1.9  Sequence of all 64 possible continuation bytes (0x80-0xbf)

+

+# <80 - 87>

+12.0:invalid hex:8081 8283 8485 8687:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <88 - 8f>

+12.1:invalid hex:8889 8a8b 8c8d 8e8f:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <90 - 97>

+12.2:invalid hex:9091 9293 9495 9697:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <98 - 9f>

+12.3:invalid hex:9899 9a9b 9c9d 9e9f:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <a0 - a7>

+12.4:invalid hex:a0a1 a2a3 a4a5 a6a7:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <a8 - af>

+12.5:invalid hex:a8a9 aaab acad aeaf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <b0 - b7>

+12.6:invalid hex:b0b1 b2b3 b4b5 b6b7:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <b8 - bf>

+12.7:invalid hex:b8b9 babb bcbd bebf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 3.2  Lonely start characters

+

+# 3.2.1  All 32 first bytes of 2-byte sequences (0xc0-0xdf),

+#        each followed by a space character.

+

+# <c0 - c3>

+13.0:invalid hex:c020 c120 c220 c320:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <c4 - c7>

+13.1:invalid hex:c420 c520 c620 c720:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <c8 - cb>

+13.2:invalid hex:c820 c920 ca20 cb20:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <cc - cf>

+13.3:invalid hex:cc20 cd20 ce20 cf20:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <d0 - d3>

+13.4:invalid hex:d020 d120 d220 d320:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <d4 - d7>

+13.5:invalid hex:d420 d520 d620 d720:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <d8 - db>

+13.6:invalid hex:d820 d920 da20 db20:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <dc - df>

+13.7:invalid hex:dc20 dd20 de20 df20:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+

+# 3.2.2  All 16 first bytes of 3-byte sequences (0xe0-0xef)

+#        each followed by a space character

+

+# <e0 - e3>

+14.0:invalid hex:e020 e120 e220 e320:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <e4 - e7>

+14.1:invalid hex:e420 e520 e620 e720:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <e8 - eb>

+14.2:invalid hex:e820 e920 ea20 eb20:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# <ec - ef>

+14.3:invalid hex:ec20 ed20 ee20 ef20:20 20 20 20:EFBFBD 20 EFBFBD 20 EFBFBD 20 EFBFBD 20

+

+# Table 3-8. U+FFFD for Non-Shortest Form Sequences

+14.4.0:invalid hex:C0 AF E0 80 BF F0 81 82 41:41:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD 41

+

+# Table 3-9. U+FFFD for Ill-Formed Sequences for Surrogates

+14.4.1:invalid hex:ED A0 80 ED BF BF ED AF 41:41:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD 41

+

+# Table 3-10. U+FFFD for Other Ill-Formed Sequences

+14.4.2:invalid hex:F4 91 92 93 FF 41 80 BF 42:41 42:efbfbd efbfbd efbfbd efbfbd efbfbd 41 efbfbd efbfbd 42

+

+# Table 3-11. U+FFFD for Truncated Sequences

+14.5.1:invalid hex:E1 80 E2 F0 91 92 F1 BF 41:41: EFBFBD EFBFBD EFBFBD EFBFBD 41

+

+# 3.2.3  All 8 first bytes of 4-byte sequences (0xf0-0xf7),

+#        each followed by a space character

+

+# <f0 - f1>

+15.0:invalid hex:f020 f120:20 20:EFBFBD 20 EFBFBD 20

+

+# <f2 - f3>

+15.1:invalid hex:f220 f320:20 20:EFBFBD 20 EFBFBD 20

+

+# <f4 - f5>

+15.2:invalid hex:f420 f520:20 20:EFBFBD 20 EFBFBD 20

+

+# <f6 - f7>

+15.3:invalid hex:f620 f720:20 20:EFBFBD 20 EFBFBD 20

+

+

+# 3.2.4  All 4 first bytes of 5-byte sequences (0xf8-0xfb),

+#        each followed by a space character

+

+# <f8>

+16.0:invalid hex:f820:20:EFBFBD 20

+

+# <f9>

+16.1:invalid hex:f920:20:EFBFBD 20

+

+# <fa>

+16.2:invalid hex:fa20:20:EFBFBD 20

+

+# <fb>

+16.3:invalid hex:fb20:20:EFBFBD 20

+

+

+# 3.2.5  All 2 first bytes of 6-byte sequences (0xfc-0xfd),

+#        each followed by a space character.

+

+# <fc>

+17.0:invalid hex:fc20:20:EFBFBD 20

+

+# <fd>

+17.1:invalid hex:fd20:20:EFBFBD 20

+

+# 3.3  Sequences with last continuation byte missing

+

+# <c0>

+18.0:invalid hex:c0:nothing:EFBFBD

+

+# <e0 80>

+18.1:invalid hex:e0 80:nothing:EFBFBD EFBFBD

+

+# <f0 80 80>

+18.2:invalid hex:f0 80 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# <f8 80 80 80>

+18.3:invalid hex:f8 80 80 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD

+

+# <fc 80 80 80 80>

+18.4:invalid hex:fc 80 80 80 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+

+# 0 -> 2 -> 1, U+000007FF, <df>

+19.0:invalid hex:df:nothing:EFBFBD

+

+# 0 -> 3 -> 2 -> 1, U+0000FFFF, <ef bf>

+19.1:invalid hex:ef bf:nothing:EFBFBD

+

+# 0 -> 1, U+001FFFFF, <f7 bf bf>

+19.2:invalid hex:f7 bf bf:nothing:EFBFBD EFBFBD EFBFBD

+

+# 0->1, U+03FFFFFF, <fb bf bf bf>

+19.3:invalid hex:fb bf bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 0->1, U+7FFFFFFF, <fd bf bf bf bf>

+19.4:invalid hex:fd bf bf bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 0->3->2-1, 123 <ef 80>

+19.5:invalid hex:31 32 33 ef 80:31 32 33:31 32 33 EFBFBD

+

+# 123 <ef 80 f0>

+19.6:invalid hex:31 32 33 ef 80 f0:31 32 33:31 32 33 EFBFBD EFBFBD

+

+# <80>

+21.0:invalid hex:80:nothing:EFBFBD

+

+# <81>

+21.1:invalid hex:81:nothing:EFBFBD

+

+# <fe>

+21.2:invalid hex:fe:nothing:EFBFBD

+

+# <ff>

+21.3:invalid hex:ff:nothing:EFBFBD

+

+# 7 <ff>

+21.4:invalid hex:37 ff:37:37 EFBFBD

+

+# 7 8 <ff>

+21.5:invalid hex:37 38 fe:37 38:37 38 EFBFBD

+

+# 7 8 9 <fe>

+21.6:invalid hex:37 38 39 fe:37 38 39:37 38 39 EFBFBD

+

+

+

+# === Overlong Characters ===

+

+# Overlong solidus has been abused before and is a potential

+# security issue.

+

+# overlong solidus <c0 af>

+22.2:invalid hex:c0 af:nothing:EFBFBD EFBFBD

+

+# overlong solidus <e0 80 af>

+22.3:invalid hex:e0 80 af:nothing:EFBFBD EFBFBD EFBFBD

+

+# overlong solidus <f0 80 80 af>

+22.4:invalid hex:f0 80 80 af:nothing:EFBFBD EFBFBD EFBFBD EFBFBD

+

+# overlong solidus <f8 80 80 80 af>

+22.5:invalid hex:f8 80 80 80 af:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# overlong solidus <fc 80 80 80 80 af>

+22.6:invalid hex:fc 80 80 80 80 af:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# max two byte overlong U+0000007F <c1 bf>

+23.0:invalid hex:c1 bf:nothing:EFBFBD EFBFBD

+

+# max three byte overlong U+000007FF <e0 9f bf>

+23.1:invalid hex:e0 9f bf:nothing:EFBFBD EFBFBD EFBFBD

+

+# overlong U+0000FFFF <f0 8f bf bf>

+23.2:invalid hex:f0 8f bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD

+

+# overlong U+001FFFFF <f8 87 bf bf bf>

+23.3:invalid hex:f8 87 bf bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+

+

+# === Surrogate Characters ===

+

+# 1 surrogate U+D800, <ed a0 80>

+24.0:invalid hex:ed a0 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# 1 surrogate U+D800, <ed a0 80> 5

+24.0.1:invalid hex:ed a0 80 35:35:EFBFBD EFBFBD EFBFBD 35

+

+# 1 surrogate U+D800, 123 <ed a0 80> 1

+24.0.2:invalid hex:31 32 33 ed a0 80 31:31 32 33 31:31 32 33 EFBFBD EFBFBD EFBFBD 31

+

+# 1 surrogate U+DB7F, <ed ad bf>

+24.2:invalid hex:ed ad bf:nothing:EFBFBD EFBFBD EFBFBD

+

+# 1 surrogate U+DB80, <ed ae 80>

+24.3:invalid hex:ed ae 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# 1 surrogate U+DBFF, <ed af bf>

+24.4:invalid hex:ed af bf:nothing:EFBFBD EFBFBD EFBFBD

+

+# 1 surrogate U+DC00, <ed b0 80>

+24.5:invalid hex:ed b0 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# 1 surrogate U+DF80, <ed be 80>

+24.6:invalid hex:ed be 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# 1 surrogate U+DFFF, <ed bf bf>

+24.7:invalid hex:ed bf bf:nothing:EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+D800 U+DC00, <eda080 edb080>

+25.0:invalid hex:ed a0 80 ed b0 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+D800 U+DFFF, <eda080 edbfbf>

+25.1:invalid hex:ed a0 80 ed bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+DB7F U+DC00, <edadbf edb080>

+25.2:invalid hex:ed ad bf ed b0 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+DB7F U+DFFF, <edadbf edbfbf>

+25.3:invalid hex:ed ad bf ed bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+DB80 U+DC00, <edae80 edb080>

+25.4:invalid hex:ed ae 80 ed b0 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+DB80 U+DFFF, <edae80 edbfbf>

+25.5:invalid hex:ed ae 80 ed bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+DBFF U+DC00, <edafbf edb080>

+25.6:invalid hex:ed af bf ed b0 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 2 surrogates U+DBFF U+DFFF, <edafbf edbfbf>

+25.7:invalid hex:ed af bf ed bf bf:nothing:EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD EFBFBD

+

+

+

+# === Valid Noncharacters ===

+

+# Page 90, section 3.4, Characters and Encoding:

+# "Normally a coded character sequence consists of a sequence of

+# encoded characters, but it may also include noncharacters or

+# reserved code points."

+#

+# Page 80, section 3.2,  Conformance Requirements:

+# Note that security problems can result if noncharacter code

+# points are removed from text received from external

+# sources. For more information, see Section 23.7, Noncharacters,

+# and Unicode Technical Report #36, "Unicode Security

+# Considerations."

+

+# U+FFFE, <EF BF BE>

+26.0:valid hex:EF BF BE

+

+# U+FFFF, <EF BF BF>

+26.1:valid hex:EF BF BF

+

+# U+FDD0, <EF B7 90>

+26.2:valid hex:EF B7 90

+

+# U+FDD1, <EF B7 91>

+26.3:valid hex:EF B7 91

+

+# U+FDD2, <EF B7 92>

+26.4:valid hex:EF B7 92

+

+# U+FDD3, <EF B7 93>

+26.5:valid hex:EF B7 93

+

+# U+FDD4, <EF B7 94>

+26.6:valid hex:EF B7 94

+

+# U+FDD5, <EF B7 95>

+26.7:valid hex:EF B7 95

+

+# U+FDD6, <EF B7 96>

+26.8:valid hex:EF B7 96

+

+# U+FDD7, <EF B7 97>

+26.9:valid hex:EF B7 97

+

+# U+FDD8, <EF B7 98>

+26.10:valid hex:EF B7 98

+

+# U+FDD9, <EF B7 99>

+26.11:valid hex:EF B7 99

+

+# U+FDDA, <EF B7 9a>

+26.12:valid hex:EF B7 9a

+

+# U+FDDB, <EF B7 9b>

+26.13:valid hex:EF B7 9b

+

+# U+FDDC, <EF B7 9c>

+26.14:valid hex:EF B7 9c

+

+# U+FDDD, <EF B7 9d>

+26.15:valid hex:EF B7 9d

+

+# U+FDDE, <EF B7 9e>

+26.16:valid hex:EF B7 9e

+

+# U+FDEF, <EF B7 9f>

+26.17:valid hex:EF B7 9f

+

+# U+1FFFE, <F0 9F BF BE>

+27.0:valid hex:F0 9F BF BE

+

+# U+2FFFE, <F0 AF BF BE>

+27.1:valid hex:F0 AF BF BE

+

+# U+3FFFE, <F0 BF BF BE>

+27.2:valid hex:F0 BF BF BE

+

+# U+4FFFE, <F1 8F BF BE>

+27.3:valid hex:F1 8F BF BE

+

+# U+5FFFE, <F1 9F BF BE>

+27.4:valid hex:F1 9F BF BE

+

+# U+6FFFE, <F1 AF BF BE>

+27.5:valid hex:F1 AF BF BE

+

+# U+7FFFE, <F1 BF BF BE>

+27.6:valid hex:F1 BF BF BE

+

+# U+8FFFE, <F2 8F BF BE>

+27.7:valid hex:F2 8F BF BE

+

+# U+9FFFE, <F2 9F BF BE>

+27.8:valid hex:F2 9F BF BE

+

+# U+AFFFE, <F2 AF BF BE>

+27.9:valid hex:F2 AF BF BE

+

+# U+BFFFE, <F2 BF BF BE>

+27.10:valid hex:F2 BF BF BE

+

+# U+CFFFE, <F3 8F BF BE>

+27.11:valid hex:F3 8F BF BE

+

+# U+DFFFE, <F3 9F BF BE>

+27.12:valid hex:F3 9F BF BE

+

+# U+EFFFE, <F3 AF BF BE>

+27.13:valid hex:F3 AF BF BE

+

+# U+FFFFE, <F3 BF BF BE>

+27.14:valid hex:F3 BF BF BE

+

+# U+10FFFE, <F4 8F BF BE>

+27.15:valid hex:F4 8F BF BE

+

+# U+1FFFF, <F0 9F BF BF>

+28.0:valid hex:F0 9F BF BF

+

+# U+2FFFF, <F0 AF BF BF>

+28.1:valid hex:F0 AF BF BF

+

+# U+3FFFF, <F0 BF BF BF>

+28.2:valid hex:F0 BF BF BF

+

+# U+4FFFF, <F1 8F BF BF>

+28.3:valid hex:F1 8F BF BF

+

+# U+5FFFF, <F1 9F BF BF>

+28.4:valid hex:F1 9F BF BF

+

+# U+6FFFF, <F1 AF BF BF>

+28.5:valid hex:F1 AF BF BF

+

+# U+7FFFF, <F1 BF BF BF>

+28.6:valid hex:F1 BF BF BF

+

+# U+8FFFF, <F2 8F BF BF>

+28.7:valid hex:F2 8F BF BF

+

+# U+9FFFF, <F2 9F BF BF>

+28.8:valid hex:F2 9F BF BF

+

+# U+AFFFF, <F2 AF BF BF>

+28.9:valid hex:F2 AF BF BF

+

+# U+BFFFF, <F2 BF BF BF>

+28.10:valid hex:F2 BF BF BF

+

+# U+CFFFF, <F3 8F BF BF>

+28.11:valid hex:F3 8F BF BF

+

+# U+DFFFF, <F3 9F BF BF>

+28.12:valid hex:F3 9F BF BF

+

+# U+EFFFF, <F3 AF BF BF>

+28.13:valid hex:F3 AF BF BF

+

+# U+FFFFF, <F3 BF BF BF>

+28.14:valid hex:F3 BF BF BF

+

+# U+10FFFF, <F4 8F BF BF>

+28.15:valid hex:F4 8F BF BF

+

+

+

+

+# === Miscellaneous Byte Sequences ===

+

+# The following tests come from looking at the UTF-8 finite state diagram

+# for cases that enter the error state. They cover each arrow

+# going into the error state.

+

+# The transition from the start state 0 to error state 1 can happen

+# with bytes 80-c1, f5-ff.

+# !(00-7f, c2-df, e0, e1-ec, ed, ee-ef, f0, f1-f3, f4)

+

+# <80>

+29.0:invalid hex:80:nothing:EFBFBD

+

+# <20 80>

+29.1:invalid hex:20 80:20:20 EFBFBD

+

+# <20 21 21 23 fe 20>

+29.2:invalid hex:20 21 21 23 fe 20:20 21 21 23 20:20 21 21 23 EFBFBD 20

+

+# <20 21 21 23 24 fe>

+29.3:invalid hex:20 21 21 23 24 fe:20 21 21 23 24:20 21 21 23 24 EFBFBD

+

+# <80 20>

+29.4:invalid hex:80 20:20:EFBFBD 20

+

+# <20 80 20>

+29.5:invalid hex:20 80 20:20 20:20 EFBFBD 20

+

+# <81 20>

+29.6:invalid hex:81 20:20:EFBFBD 20

+

+# <c1 20>

+29.7:invalid hex:c1 20:20:EFBFBD 20

+

+# <f5 20>

+29.8:invalid hex:f5 20:20:EFBFBD 20

+

+# <ff 20>

+29.9:invalid hex:ff 20:20:EFBFBD 20

+

+# The transition from the state 2 to state 1 can happen

+# with bytes 00-7f, c0-ff.

+

+# <c2 7f>, 7f = DELETE

+30.1:invalid hex:c2 7f:7f:EFBFBD 7f

+

+# <c2 c0>

+30.2:invalid hex:c2 c0:nothing:EFBFBD EFBFBD

+

+# <c2 ff>

+30.3:invalid hex:c2 ff:nothing:EFBFBD EFBFBD

+

+# <df 7f>

+30.5:invalid hex:df 7f:7f:EFBFBD 7f

+

+# <df c0>

+30.6:invalid hex:df c0:nothing:EFBFBD EFBFBD

+

+# <df ff>

+30.7:invalid hex:df ff:nothing:EFBFBD EFBFBD

+

+# <e0 a0 7f>

+31.1:invalid hex:e0 80 7f:7f:EFBFBD  EFBFBD  7f

+

+# <e0 a0 c0>

+31.2:invalid hex:e0 80 c0:nothing:EFBFBD EFBFBD EFBFBD

+

+# <e0 a0 ff>

+31.3:invalid hex:e0 80 ff:nothing:EFBFBD EFBFBD EFBFBD

+

+# <ed 80 7f>

+32.1:invalid hex:ed 80 7f:7f:EFBFBD 7f

+

+# <ed 80 c0>

+32.2:invalid hex:ed 80 c0:nothing:EFBFBD EFBFBD

+

+# <ed 80 ff>

+32.3:invalid hex:ed 80 ff:nothing:EFBFBD EFBFBD

+

+# <f0 90 80 7f>

+33.1:invalid hex:f0 90 80 7f:7f:EFBFBD 7f

+

+# <f0 90 80 c0>

+33.2:invalid hex:f0 90 80 c0:nothing:EFBFBD EFBFBD

+

+# <f0 90 80 ff>

+33.3:invalid hex:f0 90 80 ff:nothing:EFBFBD EFBFBD

+

+

+# <f1 80 80 7f>

+34.1:invalid hex:f1 80 80 7f:7f:EFBFBD 7f

+

+# <f1 80 80 c0>

+34.2:invalid hex:f1 80 80 c0:nothing:EFBFBD EFBFBD

+

+# <f1 80 80 ff>

+34.3:invalid hex:f1 80 80 ff:nothing: EFBFBD EFBFBD

+

+# <f4 80 80 7f>

+35.1:invalid hex:f4 80 80 7f:7f: EFBFBD 7f

+

+# <f4 80 80 c0>

+35.2:invalid hex:f4 80 80 c0:nothing: EFBFBD EFBFBD

+

+# <f4 80 80 ff>

+35.3:invalid hex:f4 80 80 ff:nothing: EFBFBD EFBFBD

+

+# Example in the Unicode spec. Constraints on Conversion Processes, pg 126, section 3.9.

+9.1:invalid hex:C2 41 42:41 42:EFBFBD 41 42

+

+

+

+# === Visual Tests ===

+

+# The 36 numbered tests are designed so you can manually validate

+# them by looking at the result.  The left and right sides should

+# be the same. The first line shows you what the replacement

+# character looks like. It should be a black diamond with a white

+# question mark in it.

+

+# The replacement character.

+# replacement character=EFBFBD 3d EFBFBD 2e

+36.1: valid hex: 7265706C6163656D656E74206368617261637465723D EFBFBD 3d EFBFBD 2e

+

+# Invalid ff byte.

+36.2: invalid hex: EFBFBD 3d ff 2e : EFBFBD 3d 2e : EFBFBD 3d EFBFBD 2e

+

+# Invalid two byte sequence <e0 80>.

+36.3: invalid hex: EFBFBD EFBFBD 3d e0 80 2e : EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD 3d EFBFBD EFBFBD 2e

+

+# Invalid three byte sequence <f0 80 80>.

+36.4: invalid hex: EFBFBD EFBFBD EFBFBD 3d f0 80 80 2e : EFBFBD EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD EFBFBD 3d EFBFBD EFBFBD EFBFBD 2e

+

+# Invalid three byte sequence followed by invalid bytes <f8 80 80 80>.

+36.5: invalid hex: EFBFBD EFBFBD EFBFBD EFBFBD 3d f0 80 80 80  2e : EFBFBD EFBFBD EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD EFBFBD EFBFBD 3d EFBFBD EFBFBD EFBFBD EFBFBD 2e

+

+# Invalid two byte sequence <e0 80> twice.

+36.6: invalid hex: EFBFBD EFBFBD EFBFBD EFBFBD 3d e0 80 e0 80 2e : EFBFBD EFBFBD EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD EFBFBD EFBFBD 3d EFBFBD EFBFBD EFBFBD EFBFBD 2e

+

+# too big U+001FFFFF, <F7 BF BF BF>

+36.7:invalid hex:EFBFBD EFBFBD EFBFBD EFBFBD 3d F7 BF BF BF 2e : EFBFBD EFBFBD EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD EFBFBD EFBFBD 3d EFBFBD EFBFBD EFBFBD EFBFBD 2e

+

+# 1 surrogate U+D800, <ed a0 80>

+36.8:invalid hex:EFBFBD EFBFBD EFBFBD 3d ed a0 80 2e : EFBFBD EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD EFBFBD 3d EFBFBD EFBFBD EFBFBD 2e

+

+# overlong solidus <e0 80 af>

+36.10:invalid hex:EFBFBD EFBFBD EFBFBD 3d e0 80 af 2e : EFBFBD EFBFBD EFBFBD 3d 2e : EFBFBD EFBFBD EFBFBD 3d EFBFBD EFBFBD EFBFBD 2e

+

+# valid noncharacter U+FFFF, <EF BF BF>

+36.9:valid hex:EF BF BF 3d EF BF BF 2e

+

+# valid noncharacter U+FFFE, <EF BF BE>

+36.9.1:valid hex:EF BF BE 3d EF BF BE 2e

+

+

+# === Null Characters ===

+

+

+# The null <00> byte is a valid ASCII and valid UTF-8 character.

+# It is often used to terminate a string in C type

+# languages. That is why these tests are at the end of the file.

+

+# U+00000000, null, <00>

+5.0:valid hex:00

+

+# state 0 -> 2 -> 1

+# <c2 00>

+30.0:invalid hex:c2 00:00:EFBFBD 00

+

+# <df 00>

+30.4:invalid hex:df 00:00:EFBFBD 00

+

+# state 0 -> 4 -> 2 -> 1

+# <e0 a0 00>

+31.0:invalid hex:e0 80 00:00:EFBFBD  EFBFBD  00

+

+# state 0 -> 5 -> 2 -> 1

+# <ed 80 00>

+32.0:invalid hex:ed 80 00:00:EFBFBD 00

+

+# state 0 -> 6 -> 3 -> 2 -> 1

+# <f0 90 80 00>

+33.0:invalid hex:f0 90 80 00:00:EFBFBD 00

+

+# state 0 -> 7 -> 3 -> 2 -> 1

+# <f1 80 80 00>

+34.0:invalid hex:f1 80 80 00:00:EFBFBD 00

+

+# state 0 -> 8 -> 3 -> 2 -> 1

+# <f4 80 80 00>

+35.0:invalid hex:f4 80 80 00:00: EFBFBD 00

+

+# 110xxxxx 10xxxxxx

+# 11000000 10000000

+# overlong zero <c0 80>

+37.0:invalid hex:c0 80:nothing:EFBFBD EFBFBD

+

+# 1110xxxx 10xxxxxx 10xxxxxx =

+# 11100000 10000000 10000000 =

+# overlong zero <E0 80 80>

+37.1:invalid hex:E0 80 80:nothing:EFBFBD EFBFBD EFBFBD

+

+# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

+# 11110000 10000000 10000000 10000000

+# overlong zero <F0 80 80 80>

+37.2:invalid hex:F0 80 80 80:nothing:EFBFBD EFBFBD EFBFBD EFBFBD

+

+# 0 in the middle, 20 00 35

+37.2.1:valid hex:20 00 35

+

+# 0 in the middle, 20 00 20 <ff>

+37.3:invalid hex:20 00 20 ff:20 00 20:20 00 20 EFBFBD

+

+# 0 at the end, 20 00

+37.4:valid hex:20 00

</ins><span class="cx" style="display: block; padding: 0 10px">Property changes on: trunk/tests/phpunit/data/unicode/utf8tests/utf8tests.txt

</span><span class="cx" style="display: block; padding: 0 10px">___________________________________________________________________

</span></span></pre></div>

<a id="svneolstyle"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: svn:eol-style</h4></div>

<ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+native

</ins><span class="cx" style="display: block; padding: 0 10px">\ No newline at end of property

</span><a id="trunktestsphpunittestsformattingseemsUtf8php"></a>

<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/formatting/seemsUtf8.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/formatting/seemsUtf8.php        2025-08-12 14:45:30 UTC (rev 60629)

+++ trunk/tests/phpunit/tests/formatting/seemsUtf8.php  2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1,45 +0,0 @@

</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-<?php

-

-/**

- * @group formatting

- *

- * @covers ::seems_utf8

- */

-class Tests_Formatting_SeemsUtf8 extends WP_UnitTestCase {

-

-       /**

-        * `seems_utf8` returns true for utf-8 strings, false otherwise.

-        *

-        * @dataProvider data_seems_utf8_returns_true_for_utf8_strings

-        */

-       public function test_seems_utf8_returns_true_for_utf8_strings( $utf8_string ) {

-               // From http://www.i18nguy.com/unicode-example.html

-               $this->assertTrue( seems_utf8( $utf8_string ) );

-       }

-

-       public function data_seems_utf8_returns_true_for_utf8_strings() {

-               $utf8_strings = file( DIR_TESTDATA . '/formatting/utf-8/utf-8.txt' );

-               foreach ( $utf8_strings as &$string ) {

-                       $string = (array) trim( $string );

-               }

-               unset( $string );

-               return $utf8_strings;

-       }

-

-       /**

-        * @dataProvider data_seems_utf8_returns_false_for_non_utf8_strings

-        */

-       public function test_seems_utf8_returns_false_for_non_utf8_strings( $big5_string ) {

-               $this->assertFalse( seems_utf8( $big5_string ) );

-       }

-

-       public function data_seems_utf8_returns_false_for_non_utf8_strings() {

-               // Get data from formatting/big5.txt.

-               $big5_strings = file( DIR_TESTDATA . '/formatting/big5.txt' );

-               foreach ( $big5_strings as &$string ) {

-                       $string = (array) trim( $string );

-               }

-               unset( $string );

-               return $big5_strings;

-       }

-}

</del></span></pre></div>

<a id="trunktestsphpunittestsunicodewpIsValidUtf8php"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: trunk/tests/phpunit/tests/unicode/wpIsValidUtf8.php</h4>

<pre class="diff"><span>

<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/unicode/wpIsValidUtf8.php                               (rev 0)

+++ trunk/tests/phpunit/tests/unicode/wpIsValidUtf8.php 2025-08-12 18:13:48 UTC (rev 60630)

</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -0,0 +1,101 @@

</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+<?php

+/**

+ * Unit tests covering WordPress’ UTF-8 handling.

+ *

+ * @package WordPress

+ * @group unicode

+ */

+

+class Tests_WpIsValidUtf8TestCase extends WP_UnitTestCase {

+       /**

+        * Verifies that WordPress can properly detect valid and invalid UTF-8.

+        *

+        * @ticket 38044

+        *

+        * @dataProvider data_utf8_test_data

+        *

+        * @param string $bytes Bytes as a PHP string.

+        */

+       public function test_properly_validates_utf8( string $bytes ) {

+               $is_valid = mb_check_encoding( $bytes, 'UTF-8' );

+

+               $this->assertSame(

+                       $is_valid,

+                       wp_is_valid_utf8( $bytes ),

+                       $is_valid

+                               ? 'Should have identified the input as a valid UTF-8 string.'

+                               : 'Should have reject the invalid UTF-8 string.'

+               );

+       }

+

+       /**

+        * Verifies that WordPress can properly detect valid and invalid UTF-8;

+        * forces testing with the fallback mechanism in pure PHP code.

+        *

+        * @ticket 38044

+        *

+        * @dataProvider data_utf8_test_data

+        *

+        * @param string $bytes Bytes as a PHP string.

+        */

+       public function test_fallback_properly_validates_utf8( string $bytes ) {

+               $is_valid = mb_check_encoding( $bytes, 'UTF-8' );

+

+               $this->assertSame(

+                       $is_valid,

+                       _wp_is_valid_utf8_fallback( $bytes ),

+                       $is_valid

+                               ? 'Should have identified the input as a valid UTF-8 string.'

+                               : 'Should have reject the invalid UTF-8 string.'

+               );

+       }

+

+       /**

+        * Data provider.

+        *

+        * @throws Exception

+        *

+        * @return Generator

+        */

+       public static function data_utf8_test_data() {

+               $test_file        = fopen( __DIR__ . '/../../data/unicode/utf8tests/utf8tests.txt', 'r' );

+               $last_description = '';

+

+               while ( false !== ( $line = fgets( $test_file ) ) ) {

+                       if ( empty( trim( $line ) ) ) {

+                               continue;

+                       }

+

+                       if ( str_starts_with( $line, '#' ) ) {

+                               $last_description = trim( substr( $line, 1 ) );

+                               continue;

+                       }

+

+                       $test_parts = explode( ':', $line );

+                       if ( count( $test_parts ) < 3 ) {

+                               throw new Exception( 'Wrong test data: check utf8tests.txt' );

+                       }

+

+                       list( $reference, $classification, $test_data ) = $test_parts;

+

+                       $reference      = trim( $reference );

+                       $classification = trim( $classification );

+                       $test_data      = trim( $test_data );

+

+                       switch ( $classification ) {

+                               case 'valid':

+                                       yield "{$reference} {$last_description}" => array( $test_data );

+                                       break;

+

+                               case 'valid hex':

+                               case 'invalid hex':

+                                       $bytes = hex2bin( str_replace( ' ', '', $test_data ) );

+                                       yield "{$reference} {$last_description}" => array( $bytes );

+                                       break;

+

+                               default:

+                                       throw new Exception( "Test input file contains unrecognized input classification '{$classification}' (see utf8tests.txt): {$line}" );

+                       }

+               }

+       }

+}

</ins><span class="cx" style="display: block; padding: 0 10px">Property changes on: trunk/tests/phpunit/tests/unicode/wpIsValidUtf8.php

</span><span class="cx" style="display: block; padding: 0 10px">___________________________________________________________________

</span></span></pre></div>

<a id="svneolstyle"></a>

<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: svn:eol-style</h4></div>

<ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+native

</ins><span class="cx" style="display: block; padding: 0 10px">\ No newline at end of property

</span></div>

</body>

</html>