<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58159] trunk: Improve legibility of JSON-encoded Interactivity API store data.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58159">58159</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58159","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-05-15 17:40:44 +0000 (Wed, 15 May 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>Improve legibility of JSON-encoded Interactivity API store data.

The Interactivity API has been rendering client data in a SCRIPT element with the
type `application/json` so that it's not executed as a script, but is available
to one. The data runs through `wp_json_encode()` and is encoded with some flags
to ensure that potentially-dangerous characters are escaped.

However, this can lead to some challenges. Eagerly escaping when not necessary
can make the data difficult to comprehend when reading the output HTML. For example,
all non-ASCII Unicode characters are escaped with their code point equivalent.
This results in `\ud83c\udd70` instead of `{U+01F170}`.

In this patch, the flags for JSON encoding are refined to ensure what's necessary
while relaxing other rules (leaving in those Unicode characters if the blog charset
is UTF-8). This makes for Interactivity API data that's quicker as a human reader
to decipher and diagnose.

In summary:

 - This data is JSON encoded and printed in a `<script type="application/json">` tag.

 - If we ensure that `<` is never printed inside the data, it should be impossible to
   break out of the script tag and the browser treats everything as the element's `textContent`.

 - All other escaping becomes unnecessary at that point, including unicode escaping 
   if the page uses the UTF-8 charset (the same encoding as JSON).

See https://github.com/WordPress/wordpress-develop/pull/6433#pullrequestreview-2043218338

Developed in https://github.com/WordPress/wordpress-develop/pull/6520
Discussed in https://core.trac.wordpress.org/ticket/61170

Fixes: <a href="https://core.trac.wordpress.org/ticket/61170">#61170</a>
Follow-up to: <a href="https://core.trac.wordpress.org/changeset/57563">[57563]</a>.
Props: bjorsch, dmsnell, jonsurrell, sabernhardt, westonruter.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludesinteractivityapiclasswpinteractivityapiphp">trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api.php</a></li>
<li><a href="#trunktestsphpunittestsinteractivityapiwpInteractivityAPIphp">trunk/tests/phpunit/tests/interactivity-api/wpInteractivityAPI.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludesinteractivityapiclasswpinteractivityapiphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api.php    2024-05-15 15:52:58 UTC (rev 58158)
+++ trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api.php      2024-05-15 17:40:44 UTC (rev 58159)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -167,10 +167,41 @@
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                if ( ! empty( $interactivity_data ) ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        /*
+                        * This data will be printed as JSON inside a script tag like this:
+                        *   <script type="application/json"></script>
+                        *
+                        * A script tag must be closed by a sequence beginning with `</`. It's impossible to
+                        * close a script tag without using `<`. We ensure that `<` is escaped and `/` can
+                        * remain unescaped, so `</script>` will be printed as `\u003C/script\u00E3`.
+                        *
+                        *   - JSON_HEX_TAG: All < and > are converted to \u003C and \u003E.
+                        *   - JSON_UNESCAPED_SLASHES: Don't escape /.
+                        *
+                        * If the page will use UTF-8 encoding, it's safe to print unescaped unicode:
+                        *
+                        *   - JSON_UNESCAPED_UNICODE: Encode multibyte Unicode characters literally (instead of as `\uXXXX`).
+                        *   - JSON_UNESCAPED_LINE_TERMINATORS: The line terminators are kept unescaped when
+                        *     JSON_UNESCAPED_UNICODE is supplied. It uses the same behaviour as it was
+                        *     before PHP 7.1 without this constant. Available as of PHP 7.1.0.
+                        *
+                        * The JSON specification requires encoding in UTF-8, so if the generated HTML page
+                        * is not encoded in UTF-8 then it's not safe to include those literals. They must
+                        * be escaped to avoid encoding issues.
+                        *
+                        * @see https://www.rfc-editor.org/rfc/rfc8259.html for details on encoding requirements.
+                        * @see https://www.php.net/manual/en/json.constants.php for details on these constants.
+                        * @see https://html.spec.whatwg.org/#script-data-state for details on script tag parsing.
+                        */
+                       $json_encode_flags = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS;
+                       if ( ! is_utf8_charset() ) {
+                               $json_encode_flags = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES;
+                       }
+
</ins><span class="cx" style="display: block; padding: 0 10px">                         wp_print_inline_script_tag(
</span><span class="cx" style="display: block; padding: 0 10px">                                wp_json_encode(
</span><span class="cx" style="display: block; padding: 0 10px">                                        $interactivity_data,
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                        JSON_HEX_TAG | JSON_HEX_AMP
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                                 $json_encode_flags
</ins><span class="cx" style="display: block; padding: 0 10px">                                 ),
</span><span class="cx" style="display: block; padding: 0 10px">                                array(
</span><span class="cx" style="display: block; padding: 0 10px">                                        'type' => 'application/json',
</span></span></pre></div>
<a id="trunktestsphpunittestsinteractivityapiwpInteractivityAPIphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/interactivity-api/wpInteractivityAPI.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/interactivity-api/wpInteractivityAPI.php        2024-05-15 15:52:58 UTC (rev 58158)
+++ trunk/tests/phpunit/tests/interactivity-api/wpInteractivityAPI.php  2024-05-15 17:40:44 UTC (rev 58159)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -27,6 +27,10 @@
</span><span class="cx" style="display: block; padding: 0 10px">                $this->interactivity = new WP_Interactivity_API();
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+        public function charset_iso_8859_1() {
+               return 'iso-8859-1';
+       }
+
</ins><span class="cx" style="display: block; padding: 0 10px">         /**
</span><span class="cx" style="display: block; padding: 0 10px">         * Tests that the state and config methods return an empty array at the
</span><span class="cx" style="display: block; padding: 0 10px">         * beginning.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -349,6 +353,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * properly escaped.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * @ticket 60356
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * @ticket 61170
</ins><span class="cx" style="display: block; padding: 0 10px">          *
</span><span class="cx" style="display: block; padding: 0 10px">         * @covers ::state
</span><span class="cx" style="display: block; padding: 0 10px">         * @covers ::config
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -355,16 +360,69 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @covers ::print_client_interactivity_data
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        public function test_state_and_config_escape_special_characters() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->interactivity->state( 'myPlugin', array( 'amps' => 'http://site.test/?foo=1&baz=2' ) );
-               $this->interactivity->config( 'myPlugin', array( 'tags' => 'Tags: <!-- <script>' ) );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->interactivity->state(
+                       'myPlugin',
+                       array(
+                               'ampersand'                              => '&',
+                               'less-than sign'                         => '<',
+                               'greater-than sign'                      => '>',
+                               'solidus'                                => '/',
+                               'line separator'                         => "\u{2028}",
+                               'paragraph separator'                    => "\u{2029}",
+                               'flag of england'                        => "\u{1F3F4}\u{E0067}\u{E0062}\u{E0065}\u{E006E}\u{E0067}\u{E007F}",
+                               'malicious script closer'                => '</script>',
+                               'entity-encoded malicious script closer' => '&lt;/script&gt;',
+                       )
+               );
+               $this->interactivity->config( 'myPlugin', array( 'chars' => '&<>/' ) );
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $interactivity_data_markup = get_echo( array( $this->interactivity, 'print_client_interactivity_data' ) );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                preg_match( '/<script type="application\/json" id="wp-interactivity-data">.*?(\{.*\}).*?<\/script>/s', $interactivity_data_markup, $interactivity_data_string );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         preg_match( '~<script type="application/json" id="wp-interactivity-data">\s*(\{.*\})\s*</script>~s', $interactivity_data_markup, $interactivity_data_string );
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->assertEquals(
-                       '{"config":{"myPlugin":{"tags":"Tags: \u003C!-- \u003Cscript\u003E"}},"state":{"myPlugin":{"amps":"http:\/\/site.test\/?foo=1\u0026baz=2"}}}',
-                       $interactivity_data_string[1]
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $expected = <<<"JSON"
+{"config":{"myPlugin":{"chars":"&\\u003C\\u003E/"}},"state":{"myPlugin":{"ampersand":"&","less-than sign":"\\u003C","greater-than sign":"\\u003E","solidus":"/","line separator":"\u{2028}","paragraph separator":"\u{2029}","flag of england":"\u{1F3F4}\u{E0067}\u{E0062}\u{E0065}\u{E006E}\u{E0067}\u{E007F}","malicious script closer":"\\u003C/script\\u003E","entity-encoded malicious script closer":"&lt;/script&gt;"}}}
+JSON;
+               $this->assertEquals( $expected, $interactivity_data_string[1] );
+       }
+
+       /**
+        * Tests that special characters in the initial state and configuration are
+        * properly escaped when the blog_charset is not UTF-8 (unicode compatible).
+        *
+        * This this test, unicode and line terminators should be escaped to their
+        * JSON unicode sequences.
+        *
+        * @ticket 61170
+        *
+        * @covers ::state
+        * @covers ::config
+        * @covers ::print_client_interactivity_data
+        */
+       public function test_state_and_config_escape_special_characters_non_utf8() {
+               add_filter( 'pre_option_blog_charset', array( $this, 'charset_iso_8859_1' ) );
+               $this->interactivity->state(
+                       'myPlugin',
+                       array(
+                               'ampersand'                              => '&',
+                               'less-than sign'                         => '<',
+                               'greater-than sign'                      => '>',
+                               'solidus'                                => '/',
+                               'line separator'                         => "\u{2028}",
+                               'paragraph separator'                    => "\u{2029}",
+                               'flag of england'                        => "\u{1F3F4}\u{E0067}\u{E0062}\u{E0065}\u{E006E}\u{E0067}\u{E007F}",
+                               'malicious script closer'                => '</script>',
+                               'entity-encoded malicious script closer' => '&lt;/script&gt;',
+                       )
</ins><span class="cx" style="display: block; padding: 0 10px">                 );
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                $this->interactivity->config( 'myPlugin', array( 'chars' => '&<>/' ) );
+
+               $interactivity_data_markup = get_echo( array( $this->interactivity, 'print_client_interactivity_data' ) );
+               preg_match( '~<script type="application/json" id="wp-interactivity-data">\s*(\{.*\})\s*</script>~s', $interactivity_data_markup, $interactivity_data_string );
+
+               $expected = <<<"JSON"
+{"config":{"myPlugin":{"chars":"&\\u003C\\u003E/"}},"state":{"myPlugin":{"ampersand":"&","less-than sign":"\\u003C","greater-than sign":"\\u003E","solidus":"/","line separator":"\\u2028","paragraph separator":"\\u2029","flag of england":"\\ud83c\\udff4\\udb40\\udc67\\udb40\\udc62\\udb40\\udc65\\udb40\\udc6e\\udb40\\udc67\\udb40\\udc7f","malicious script closer":"\\u003C/script\\u003E","entity-encoded malicious script closer":"&lt;/script&gt;"}}}
+JSON;
+               $this->assertEquals( $expected, $interactivity_data_string[1] );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span></span></pre>
</div>
</div>

</body>
</html>