[wp-trac] [WordPress Trac] #51159: Let's expand our context specific escaping methods for wp_json_encode().
WordPress Trac
noreply at wordpress.org
Thu Aug 27 18:54:00 UTC 2020
#51159: Let's expand our context specific escaping methods for wp_json_encode().
-------------------------------------------------+-------------------------
Reporter: whyisjake | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting
| Review
Component: Security | Version:
Severity: normal | Keywords:
Focuses: javascript, template, coding- |
standards |
-------------------------------------------------+-------------------------
This document is largely sourced from a document written by @mdawaffe.
Full credit to him for the research and thoughts put forward here. What
I'd like to do is move this toward some actionable functions and developer
best practices moving forward.
`wp_json_encode()` is a handy helper for turning PHP into Javascript, and
it is widely used in different places to serialize different variables.
Imagine this hypothetical scenario:
{{{
BAD:
<pre><?php echo json_encode( $_GET ); ?></pre>
}}}
`json_encode()` serializes data into a string that can be used as a
JavaScript literal* (e.g., `null`, `true`, `false`, `1234` (numbers),
`"strings"`, `[ "arrays" ]`, and `{ "objects": "oh my"}`.)
JSON serialization, though, has nothing to do with HTML, and so does not
treat characters that are special in HTML (`<`, `>`, `&`, `'`, `"`) in any
special way: essentially, the code above is as bad as `echo $_GET['foo']`.
Securing the above code is as simple as it always is in WordPress. We’re
echoing data inside an HTML text node, so we use `esc_html()`:
{{{
OK:
<pre><?php echo esc_html( json_encode( $_GET ) ); ?></pre>
}}}
Unfortunately, while secure 😀, this code is not actually correct ☹️. For
historical reasons, `esc_html()` will not touch HTML entities (`&`):
|| Input || `htmlspecialchars()` || `esc_html()` ||
|| `&` || `&` 😀 || `&` 😀 ||
|| `&` || `&` 😀 || `&` ☹️ ||
In the example above, if there are any HTML entities in $_GET, they will
be echoed verbatim to the page, which means they will appear unescaped to
the page’s visitor.
=== In an HTML Text Node ===
To faithfully represent the contents of a JSON blob in an HTML text node,
the following code must be used:
{{{
GOOD:
<pre><php echo _wp_specialchars(
wp_json_encode( $value ),
ENT_NOQUOTES, // Don't need to HTML-escape quotes (output is for a
text node).
'UTF-8', // json_encode() outputs UTF-8 (really just ASCII),
not the blog's charset.
true, // Do "re-escape" HTML entities: `&` ->
`&`
); ?><pre>
}}}
This code is only appropriate for outputting JSON in HTML text nodes.
There are several other contexts where we would like to output JSON, and
each of those different contexts requires different treatment.
=== As an HTML Attribute Node ===
Though the HTML5 `.dataset` API only accepts string values for `data-*`
attributes, jQuery will automatically parse `data-*` attribute values that
are JSON serializations. So, when using jQuery, the following pattern is
often handy:
{{{
BAD:
<div data-foo='<?php echo json_encode( $foo ); ?>'>
}}}
Handy but, as we should know by now, insecure 😀. We need to HTML-escape
the output.
Like `esc_html()`, `esc_attr()` also leaves HTML entities untouched, so,
again, the solution is to “manually” use `_wp_specialchars()`:
{{{
GOOD:
<div data-foo='<?php echo _wp_specialchars(
wp_json_encode( $foo ),
ENT_QUOTES, // Must HTML-escape quotes (output is for an
attriibute node).
'UTF-8', // json_encode() outputs UTF-8 (really just ASCII),
not the blog's charset.
true, // Do "re-escape" HTML entities: `&` ->
`&`
); ?>'>
}}}
It’s important to note that this code snippet is suitable for whole HTML
attributes. It is not appropriate for use on part of an HTML attribute.
In general, when we want to output a JSON blob as part of an HTML
attribute, it’s because we’re trying to use it as a JavaScript literal:
{{{
BAD:
<a href="#" onclick="doSomething( <?php echo json_encode( $click_data );
?> )">
}}}
We’ve seen that using json_encode() by itself in this context is not
secure, but neither is using the above HTML attribute code
(`_wp_specialchars( json_encode(), … )`). In the `data-foo` case above,
we’re outputting JSON. In the `onclick` case, we’re outputting a
JavaScript literal. `_wp_specialchars()` does enough to the JSON blob to
make it safe for use as an HTML attribute, but it does not do anything to
make it safe for use within JavaScript.
Despite claiming above that `json_encode()` outputs JavaScript literals,
it’s more complicated than that.
In a `<script>` Element
The problem is that, in a <script> element, for example, we have to
consider how the contents are interpreted as HTML first and then as
JavaScript second.
The following pattern seems helpful. Use `json_encode()` to output a PHP
string as a JavaScript string literal:
{{{
BAD:
<script>
var foo = <?php echo json_encode( (string) $foo ); ?>;
</script>
}}}
There are multiple ways in which this is insecure.
First, for some pages, HTML entities and the characters they represent are
one and the same in the `<script>` element context. If `$foo` has HTML
entities in it, problems will happen. For example, for some pages, the
following two scripts are the same:
{{{
WAT?
<script>
var foo = "Hello"; alert(/LOL/); var foo="LOL";
</script>
<script>
var foo = "Hello"; alert(/LOL/); var foo="LOL";
</script>
}}}
Exactly how HTML entities are interpreted in <script> elements depends on
Content-Type, DOCTYPE, browser, etc. So we need a way to securely use JSON
in this context that does not depend HTML-escaping (`_wp_specialchars()`).
So we can’t depend on HTML-escaping to save us, and we need to make sure
certain strings are never output. Luckily, there are a couple of other
widely implemented transformations that can help.
{{{
GOOD:
<script>
var foo = decodeURIComponent( '<?php echo rawurlencode( (string) $foo );
?>' );
</script>
}}}
If we’re outputting a string, we can URL-encode it in PHP and URL-decode
it in JavaScript. (We do have to make sure we use the right functions:
`rawurlencode()` is slightly better than `urlencode()` here, and
`decodeURIComponent()` is required over JavaScript’s deprecated
`unescape()`.)
The useful property of URL-encoding that we’re exploiting is that the
transformed string is guaranteed not to have any characters in it that are
regarded as special in the HTML context (`<`, `>`, `&`, `'`, `"`), so
there’s no way to output `"` or an HTML Comment Opener.
For non-scalar data, this can be extended via json_encode():
{{{
GOOD:
<script>
var foo = JSON.parse( decodeURIComponent( '<?php
echo rawurlencode( wp_json_encode( $foo ) );
?>' );
</script>
}}}
Rather than using the output of `json_encode()` as a JavaScript literal,
the code above URL-encodes the whole serialization (braces, quotation
marks and all), and we URL-decode and parse the resulting string in
JavaScript to get back the structured data.
This URL-encoding is a bit tedious and results in some ugly looking
JavaScript. (Ugly JavaScript is better than vulnerable JavaScript!) We can
instead use JavaScript’s Unicode-escaping:
HTML Special Characters:
{{{
<script>
"<" === "\u003c" // true: < is U+3C
">" === "\u003e" // true: < is U+3E
"&" === "\u0026" // true: & is U+26
"'" === "\u0027" // true: " is U+27
'"' === "\u0022" // true: " is U+22
</script>
}}}
PHP’s `json_encode()` has an `$options` parameter, which can be used to
always Unicode-escape these HTML special characters:
ALMOST (PHP 5.3+):
<script>
var foo = <?php echo wp_json_encode( $foo, JSON_HEX_TAG | JSON_HEX_AMP |
JSON_HEX_APOS | JSON_HEX_QUOT ); ?>;
</script>
These constants are only available as of PHP 5.3.
Also, just replacing those characters isn’t good enough. We also need to
Unicode-escape ``` and `$` because of their special meanings in JavaScript
template literals.
GOOD (PHP 5.3+):
{{{
<script>
var message = `hello, ${<?php echo str_replace(
array( '`', '$' ),
array( '\\u0060', '\\u0024' ),
wp_json_encode( $user, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS
| JSON_HEX_QUOT )
); ?>.name}`;
</script>
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/51159>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list