[wp-trac] [WordPress Trac] #29717: wp_check_invalid_utf8 - pcre tricks and failsafes, +mb_convert_encoding, iconv fix, performance
WordPress Trac
noreply at wordpress.org
Sat Sep 20 19:43:39 UTC 2014
#29717: wp_check_invalid_utf8 - pcre tricks and failsafes, +mb_convert_encoding,
iconv fix, performance
--------------------------------+------------------------------------------
Reporter: askapache | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Formatting | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch dev- | Focuses: administration, performance
feedback |
--------------------------------+------------------------------------------
Comment (by askapache):
Replying to [comment:2 miqrogroove]:
> Impressive. So the main benefits are 10% faster and more compatibility?
Are there any systems currently running WordPress that need this patch? A
more concise, big picture description would help.
>
> Also, I learned in feedback from the 4.0 release that we need to
specifically test PHP versions less than 5.4.9 and 5.3.19, because they
exhibit crashes when PCRE is used to perform certain types of alternation
and backtracking. I found that version 5.2.13 is particularly easy to
download. It is not necessary to add unit tests for that, but we need to
see that if someone posts a 10kb or 100kb block of text that it won't
suddenly crash due to a server bug.
The updates don't actually change the behaviour of this function unless:
1. You are one of those with a site with an older pcre lacking utf-8
support, in which case those 4 functions will now correctly filter and
check for invalid utf.
2. You use the `strip` parameter to actually remove invalid utf for a
plugin or theme, in which case it will now work correctly. That was a bug
fix.
Some folks have pcre compiled without utf support enabled or with utf-
support missing, or disabled, so for them the '/u' doesn't work which
results in essentially this entire check being skipped.
This is also somewhat of a security issue, such as the whole IDN domain
issues and other utf exploits. The big big picture with this is to update
the function to more easily developed and used, it hadn't been updated for
quite a while. This should make it easier to update/extend/move this
function down the road, I think some people may have wrongly assumed that
it was doing more than it was. It's kind of a strange function, to take a
string as input and either return it as is, or return a blank string in
case of invalid utf-8. But that's actually really clever, it's much safer
and faster that way, just not so clear.
I've noticed several plugins like disqus and yoast seo have started to
build their own incarnations of this function, this update should help
make clear what it is and isn't.
I have tested on PHP 5.2, I approached this with extreme caution to avoid
causing any issues. IOW, this function will also work on 5.0. The only
reason it wouldn't work for php 4.x is that `stripos` wasn't available as
a builtin zend function until 5.0, but I noticed it's being used in
several places in core so.. ( I am still used to having to code backwards
for 4.x, so happy that's officially over for WP).
The big change are the 2 new fallbacks to the original preg_match,
including the custom regex, which will be the fallback for those with
absolutely no utf pcre capability. It has to be a rarity for that to ever
actually be needed, but that's the only possible issue I can see with
regard to buffer issues or memory problems. preg_match isn't as efficient
as a builtin function such as strpos, but it is pretty darn efficient.
The other big change is making the 'strip' parameter work, since it isn't
actually being used by any core, it seems to have been forgotten about a
little. With it now working, I will start using it in plugins and themes
to sanitize utf-8 (because this is super fast). That's actually why I
initially started on this.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/29717#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list