[wp-trac] [WordPress Trac] #60295: esc_html() function returns an empty string when the last character of the input string variable is ASCII 145 or 146
WordPress Trac
noreply at wordpress.org
Fri Jan 19 16:23:32 UTC 2024
#60295: esc_html() function returns an empty string when the last character of the
input string variable is ASCII 145 or 146
-------------------------------+------------------------------
Reporter: jani20 | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Formatting | Version:
Severity: normal | Resolution:
Keywords: reporter-feedback | Focuses:
-------------------------------+------------------------------
Comment (by dmsnell):
@TobiasBg I think your example is using U+2019 _right single quotation
mark_. It's hard to see because PHP is probably using UTF-8 by default and
your string is the byte sequence `"test\xe2\x80\x99"`
I'm able to reproduce using this.
{{{#!php
<?php
'' === esc_html( "test\x91" );
'' === esc_html( "test\x92" );
}}}
Now these single quotation marks @jani20 are not actually ASCII, but
CP-1252, which is the default character encoding Microsoft used for its
products for a long time. I'm guessing that your blog's charset is set to
UTF-8, where these //bytes// form an invalid string.
{{{#!php
php > iconv( 'utf-8', 'utf-8', "test\x91" );
PHP Notice: iconv(): Detected an illegal character in input string in php
shell code on line 1
Notice: iconv(): Detected an illegal character in input string in php
shell code on line 1
}}}
Things you might want to check:
- the [https://codex.wordpress.org/Converting_Database_Character_Sets
database character encoding].
- your browser might have a character encoding selection in the Edit
menu, or elsewhere. UTF-8 is what it likely should be. I've seen "Default"
fail for some sites that don't indicate their charset.
- ensure your theme is generating a
[https://codex.wordpress.org/Meta_Tags_in_WordPress META element] with the
right character encoding, or "charset"
These characters may legitimately appear in HTML; when they do, WordPress
[https://html.spec.whatwg.org/#numeric-character-reference-end-state
should be treating them as CP-1252 treats them]. It does this right now if
they appear through character references like `` but not if they
come through directly as normal text.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/60295#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list