[wp-trac] [WordPress Trac] #51159: Let's expand our context specific escaping methods for wp_json_encode().
WordPress Trac
noreply at wordpress.org
Thu Aug 27 18:59:19 UTC 2020
#51159: Let's expand our context specific escaping methods for wp_json_encode().
-------------------------+-------------------------------------------------
Reporter: whyisjake | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Security | Version:
Severity: normal | Resolution:
Keywords: | Focuses: javascript, template, coding-
| standards
-------------------------+-------------------------------------------------
Old description:
> This document is largely sourced from a document written by @mdawaffe.
> Full credit to him for the research and thoughts put forward here. What
> I'd like to do is move this toward some actionable functions and
> developer best practices moving forward.
>
> `wp_json_encode()` is a handy helper for turning PHP into Javascript, and
> it is widely used in different places to serialize different variables.
> Imagine this hypothetical scenario:
>
> {{{
> BAD:
> <pre><?php echo json_encode( $_GET ); ?></pre>
> }}}
>
> `json_encode()` serializes data into a string that can be used as a
> JavaScript literal* (e.g., `null`, `true`, `false`, `1234` (numbers),
> `"strings"`, `[ "arrays" ]`, and `{ "objects": "oh my"}`.)
>
> JSON serialization, though, has nothing to do with HTML, and so does not
> treat characters that are special in HTML (`<`, `>`, `&`, `'`, `"`) in
> any special way: essentially, the code above is as bad as `echo
> $_GET['foo']`.
>
> Securing the above code is as simple as it always is in WordPress. We’re
> echoing data inside an HTML text node, so we use `esc_html()`:
>
> {{{
> OK:
> <pre><?php echo esc_html( json_encode( $_GET ) ); ?></pre>
> }}}
>
> Unfortunately, while secure 😀, this code is not actually correct ☹️. For
> historical reasons, `esc_html()` will not touch HTML entities (`&`):
>
> || Input || `htmlspecialchars()` || `esc_html()` ||
> || `&` || `&` 😀 || `&` 😀 ||
> || `&` || `&` 😀 || `&` ☹️ ||
>
> In the example above, if there are any HTML entities in $_GET, they will
> be echoed verbatim to the page, which means they will appear unescaped to
> the page’s visitor.
>
> === In an HTML Text Node ===
>
> To faithfully represent the contents of a JSON blob in an HTML text node,
> the following code must be used:
>
> {{{
> GOOD:
> <pre><php echo _wp_specialchars(
> wp_json_encode( $value ),
> ENT_NOQUOTES, // Don't need to HTML-escape quotes (output is for
> a text node).
> 'UTF-8', // json_encode() outputs UTF-8 (really just ASCII),
> not the blog's charset.
> true, // Do "re-escape" HTML entities: `&` ->
> `&`
> ); ?><pre>
> }}}
>
> This code is only appropriate for outputting JSON in HTML text nodes.
> There are several other contexts where we would like to output JSON, and
> each of those different contexts requires different treatment.
>
> === As an HTML Attribute Node ===
>
> Though the HTML5 `.dataset` API only accepts string values for `data-*`
> attributes, jQuery will automatically parse `data-*` attribute values
> that are JSON serializations. So, when using jQuery, the following
> pattern is often handy:
>
> {{{
> BAD:
> <div data-foo='<?php echo json_encode( $foo ); ?>'>
> }}}
>
> Handy but, as we should know by now, insecure 😀. We need to HTML-escape
> the output.
>
> Like `esc_html()`, `esc_attr()` also leaves HTML entities untouched, so,
> again, the solution is to “manually” use `_wp_specialchars()`:
>
> {{{
> GOOD:
> <div data-foo='<?php echo _wp_specialchars(
> wp_json_encode( $foo ),
> ENT_QUOTES, // Must HTML-escape quotes (output is for an
> attriibute node).
> 'UTF-8', // json_encode() outputs UTF-8 (really just ASCII),
> not the blog's charset.
> true, // Do "re-escape" HTML entities: `&` ->
> `&`
> ); ?>'>
> }}}
>
> It’s important to note that this code snippet is suitable for whole HTML
> attributes. It is not appropriate for use on part of an HTML attribute.
>
> In general, when we want to output a JSON blob as part of an HTML
> attribute, it’s because we’re trying to use it as a JavaScript literal:
>
> {{{
> BAD:
> <a href="#" onclick="doSomething( <?php echo json_encode( $click_data );
> ?> )">
> }}}
>
> We’ve seen that using json_encode() by itself in this context is not
> secure, but neither is using the above HTML attribute code
> (`_wp_specialchars( json_encode(), … )`). In the `data-foo` case above,
> we’re outputting JSON. In the `onclick` case, we’re outputting a
> JavaScript literal. `_wp_specialchars()` does enough to the JSON blob to
> make it safe for use as an HTML attribute, but it does not do anything to
> make it safe for use within JavaScript.
>
> Despite claiming above that `json_encode()` outputs JavaScript literals,
> it’s more complicated than that.
>
> In a `<script>` Element
>
> The problem is that, in a <script> element, for example, we have to
> consider how the contents are interpreted as HTML first and then as
> JavaScript second.
>
> The following pattern seems helpful. Use `json_encode()` to output a PHP
> string as a JavaScript string literal:
>
> {{{
> BAD:
> <script>
> var foo = <?php echo json_encode( (string) $foo ); ?>;
> </script>
> }}}
>
> There are multiple ways in which this is insecure.
>
> First, for some pages, HTML entities and the characters they represent
> are one and the same in the `<script>` element context. If `$foo` has
> HTML entities in it, problems will happen. For example, for some pages,
> the following two scripts are the same:
>
> {{{
> WAT?
> <script>
> var foo = "Hello"; alert(/LOL/); var foo="LOL";
> </script>
> <script>
> var foo = "Hello"; alert(/LOL/); var foo="LOL";
> </script>
> }}}
>
> Exactly how HTML entities are interpreted in <script> elements depends on
> Content-Type, DOCTYPE, browser, etc. So we need a way to securely use
> JSON in this context that does not depend HTML-escaping
> (`_wp_specialchars()`).
>
> So we can’t depend on HTML-escaping to save us, and we need to make sure
> certain strings are never output. Luckily, there are a couple of other
> widely implemented transformations that can help.
>
> {{{
> GOOD:
> <script>
> var foo = decodeURIComponent( '<?php echo rawurlencode( (string) $foo );
> ?>' );
> </script>
> }}}
>
> If we’re outputting a string, we can URL-encode it in PHP and URL-decode
> it in JavaScript. (We do have to make sure we use the right functions:
> `rawurlencode()` is slightly better than `urlencode()` here, and
> `decodeURIComponent()` is required over JavaScript’s deprecated
> `unescape()`.)
>
> The useful property of URL-encoding that we’re exploiting is that the
> transformed string is guaranteed not to have any characters in it that
> are regarded as special in the HTML context (`<`, `>`, `&`, `'`, `"`), so
> there’s no way to output `"` or an HTML Comment Opener.
>
> For non-scalar data, this can be extended via json_encode():
>
> {{{
> GOOD:
> <script>
> var foo = JSON.parse( decodeURIComponent( '<?php
> echo rawurlencode( wp_json_encode( $foo ) );
> ?>' );
> </script>
> }}}
>
> Rather than using the output of `json_encode()` as a JavaScript literal,
> the code above URL-encodes the whole serialization (braces, quotation
> marks and all), and we URL-decode and parse the resulting string in
> JavaScript to get back the structured data.
>
> This URL-encoding is a bit tedious and results in some ugly looking
> JavaScript. (Ugly JavaScript is better than vulnerable JavaScript!) We
> can instead use JavaScript’s Unicode-escaping:
>
> HTML Special Characters:
> {{{
> <script>
> "<" === "\u003c" // true: < is U+3C
> ">" === "\u003e" // true: < is U+3E
> "&" === "\u0026" // true: & is U+26
> "'" === "\u0027" // true: " is U+27
> '"' === "\u0022" // true: " is U+22
> </script>
> }}}
>
> PHP’s `json_encode()` has an `$options` parameter, which can be used to
> always Unicode-escape these HTML special characters:
>
> ALMOST (PHP 5.3+):
> <script>
> var foo = <?php echo wp_json_encode( $foo, JSON_HEX_TAG | JSON_HEX_AMP |
> JSON_HEX_APOS | JSON_HEX_QUOT ); ?>;
> </script>
>
> These constants are only available as of PHP 5.3.
>
> Also, just replacing those characters isn’t good enough. We also need to
> Unicode-escape ``` and `$` because of their special meanings in
> JavaScript template literals.
>
> GOOD (PHP 5.3+):
>
> {{{
> <script>
> var message = `hello, ${<?php echo str_replace(
> array( '`', '$' ),
> array( '\\u0060', '\\u0024' ),
> wp_json_encode( $user, JSON_HEX_TAG | JSON_HEX_AMP |
> JSON_HEX_APOS | JSON_HEX_QUOT )
> ); ?>.name}`;
> </script>
> }}}
New description:
This document is largely sourced from a document written by @mdawaffe.
Full credit to him for the research and thoughts put forward here. What
I'd like to do is move this toward some actionable functions and developer
best practices moving forward.
`wp_json_encode()` is a handy helper for turning PHP into Javascript, and
it is widely used in different places to serialize different variables.
Imagine this hypothetical scenario:
{{{
BAD:
<pre><?php echo json_encode( $_GET ); ?></pre>
}}}
`json_encode()` serializes data into a string that can be used as a
JavaScript literal* (e.g., `null`, `true`, `false`, `1234` (numbers),
`"strings"`, `[ "arrays" ]`, and `{ "objects": "oh my"}`.)
JSON serialization, though, has nothing to do with HTML, and so does not
treat characters that are special in HTML (`<`, `>`, `&`, `'`, `"`) in any
special way: essentially, the code above is as bad as `echo $_GET['foo']`.
Securing the above code is as simple as it always is in WordPress. We’re
echoing data inside an HTML text node, so we use `esc_html()`:
{{{
OK:
<pre><?php echo esc_html( json_encode( $_GET ) ); ?></pre>
}}}
Unfortunately, while secure 😀, this code is not actually correct ☹️. For
historical reasons, `esc_html()` will not touch HTML entities (`&`):
|| Input || `htmlspecialchars()` || `esc_html()` ||
|| `&` || `&` 😀 || `&` 😀 ||
|| `&` || `&` 😀 || `&` ☹️ ||
In the example above, if there are any HTML entities in $_GET, they will
be echoed verbatim to the page, which means they will appear unescaped to
the page’s visitor.
=== In an HTML Text Node ===
To faithfully represent the contents of a JSON blob in an HTML text node,
the following code must be used:
{{{
GOOD:
<pre><php echo _wp_specialchars(
wp_json_encode( $value ),
ENT_NOQUOTES, // Don't need to HTML-escape quotes (output is for a
text node).
'UTF-8', // json_encode() outputs UTF-8 (really just ASCII),
not the blog's charset.
true, // Do "re-escape" HTML entities: `&` ->
`&`
); ?><pre>
}}}
This code is only appropriate for outputting JSON in HTML text nodes.
There are several other contexts where we would like to output JSON, and
each of those different contexts requires different treatment.
=== As an HTML Attribute Node ===
Though the HTML5 `.dataset` API only accepts string values for `data-*`
attributes, jQuery will automatically parse `data-*` attribute values that
are JSON serializations. So, when using jQuery, the following pattern is
often handy:
{{{
BAD:
<div data-foo='<?php echo json_encode( $foo ); ?>'>
}}}
Handy but, as we should know by now, insecure 😀. We need to HTML-escape
the output.
Like `esc_html()`, `esc_attr()` also leaves HTML entities untouched, so,
again, the solution is to “manually” use `_wp_specialchars()`:
{{{
GOOD:
<div data-foo='<?php echo _wp_specialchars(
wp_json_encode( $foo ),
ENT_QUOTES, // Must HTML-escape quotes (output is for an
attriibute node).
'UTF-8', // json_encode() outputs UTF-8 (really just ASCII),
not the blog's charset.
true, // Do "re-escape" HTML entities: `&` ->
`&`
); ?>'>
}}}
It’s important to note that this code snippet is suitable for whole HTML
attributes. It is not appropriate for use on part of an HTML attribute.
In general, when we want to output a JSON blob as part of an HTML
attribute, it’s because we’re trying to use it as a JavaScript literal:
{{{
BAD:
<a href="#" onclick="doSomething( <?php echo json_encode( $click_data );
?> )">
}}}
We’ve seen that using json_encode() by itself in this context is not
secure, but neither is using the above HTML attribute code
(`_wp_specialchars( json_encode(), … )`). In the `data-foo` case above,
we’re outputting JSON. In the `onclick` case, we’re outputting a
JavaScript literal. `_wp_specialchars()` does enough to the JSON blob to
make it safe for use as an HTML attribute, but it does not do anything to
make it safe for use within JavaScript.
Despite claiming above that `json_encode()` outputs JavaScript literals,
it’s more complicated than that.
In a `<script>` Element
The problem is that, in a <script> element, for example, we have to
consider how the contents are interpreted as HTML first and then as
JavaScript second.
The following pattern seems helpful. Use `json_encode()` to output a PHP
string as a JavaScript string literal:
{{{
BAD:
<script>
var foo = <?php echo json_encode( (string) $foo ); ?>;
</script>
}}}
There are multiple ways in which this is insecure.
First, for some pages, HTML entities and the characters they represent are
one and the same in the `<script>` element context. If `$foo` has HTML
entities in it, problems will happen. For example, for some pages, the
following two scripts are the same:
{{{
WAT?
<script>
var foo = "Hello"; alert(/LOL/); var foo="LOL";
</script>
<script>
var foo = "Hello"; alert(/LOL/); var foo="LOL";
</script>
}}}
Exactly how HTML entities are interpreted in <script> elements depends on
Content-Type, DOCTYPE, browser, etc. So we need a way to securely use JSON
in this context that does not depend HTML-escaping (`_wp_specialchars()`).
So we can’t depend on HTML-escaping to save us, and we need to make sure
certain strings are never output. Luckily, there are a couple of other
widely implemented transformations that can help.
{{{
GOOD:
<script>
var foo = decodeURIComponent( '<?php echo rawurlencode( (string) $foo );
?>' );
</script>
}}}
If we’re outputting a string, we can URL-encode it in PHP and URL-decode
it in JavaScript. (We do have to make sure we use the right functions:
`rawurlencode()` is slightly better than `urlencode()` here, and
`decodeURIComponent()` is required over JavaScript’s deprecated
`unescape()`.)
The useful property of URL-encoding that we’re exploiting is that the
transformed string is guaranteed not to have any characters in it that are
regarded as special in the HTML context (`<`, `>`, `&`, `'`, `"`), so
there’s no way to output `"` or an HTML Comment Opener.
For non-scalar data, this can be extended via json_encode():
{{{
GOOD:
<script>
var foo = JSON.parse( decodeURIComponent( '<?php
echo rawurlencode( wp_json_encode( $foo ) );
?>' );
</script>
}}}
Rather than using the output of `json_encode()` as a JavaScript literal,
the code above URL-encodes the whole serialization (braces, quotation
marks and all), and we URL-decode and parse the resulting string in
JavaScript to get back the structured data.
This URL-encoding is a bit tedious and results in some ugly looking
JavaScript. (Ugly JavaScript is better than vulnerable JavaScript!) We can
instead use JavaScript’s Unicode-escaping:
HTML Special Characters:
{{{
<script>
"<" === "\u003c" // true: < is U+3C
">" === "\u003e" // true: < is U+3E
"&" === "\u0026" // true: & is U+26
"'" === "\u0027" // true: " is U+27
'"' === "\u0022" // true: " is U+22
</script>
}}}
PHP’s `json_encode()` has an `$options` parameter, which can be used to
always Unicode-escape these HTML special characters:
{{{
ALMOST (PHP 5.3+):
<script>
var foo = <?php echo wp_json_encode( $foo, JSON_HEX_TAG | JSON_HEX_AMP |
JSON_HEX_APOS | JSON_HEX_QUOT ); ?>;
</script>
}}}
These constants are only available as of PHP 5.3.
Also, just replacing those characters isn’t good enough. We also need to
Unicode-escape ``` and `$` because of their special meanings in
JavaScript template literals.
{{{
GOOD (PHP 5.3+):
<script>
var message = `hello, ${<?php echo str_replace(
array( '`', '$' ),
array( '\\u0060', '\\u0024' ),
wp_json_encode( $user, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS
| JSON_HEX_QUOT )
); ?>.name}`;
</script>
}}}
--
Comment (by whyisjake):
Ideally, WordPress core has a few functions that can replace the laborious
methods to escape Javascript content given the different contexts.
Something like the following:
* `esc_json()`
* `esc_js_attr()` or maybe `esc_attr( $thing, 'json' )`
* `esc_wp_json_encode()` etc...
--
Ticket URL: <https://core.trac.wordpress.org/ticket/51159#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list