[wp-trac] [WordPress Trac] #24661: remove_accents is not removing combining accents
WordPress Trac
noreply at wordpress.org
Mon Oct 24 03:48:43 UTC 2016
#24661: remove_accents is not removing combining accents
----------------------------------+--------------------
Reporter: NumidWasNotAvailable | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: 4.7
Component: Formatting | Version: 1.2.1
Severity: normal | Resolution:
Keywords: has-patch | Focuses:
----------------------------------+--------------------
Comment (by gitlost):
Thanks for looking at this.
`gen_cat_regex_alts.php` and `gen_script_regex_alts.php` (and
`gen_remove_accents_ranges.php` below) were just on my local machine. As a
temporary measure I've uploaded them (after a bit of a clean up) to the
[https://github.com/gitlost/unfc-normalize/tree/master/tools tools
directory of the UNFC Normalizer] plugin, as they use a functions library
developed for it.
I'm not sure they should be part of the build process, seeing as they
depend on infrequently changing Unicode data. However I will clean them up
further and separate them out so they're suitable for upload to the trunc
tools directory if required.
Nobody likes the separate file, so I'll upload a new patch with the regexs
just copy-and-pasted in. It was never a performance thing, more for ease
of generation (and to hide their ugliness). This now also argues against
them being part of the build process though.
The reason it's only testing Unicode 5.0.0 is due to the PCREs that are
bundled with PHP 5.2.4 to PHP 7 being built with Unicode data ranging from
Unicode 5.0.0 to Unicode 7.0.0, and thus giving varying results for
character properties over their various lifetimes. I could generate
different ranges depending on the version of PCRE being used when the
tests are run, but I thought that was a bit OTT so went for the lowest
common denominator.
The tests were generated automatically, by
`gen_remove_accents_ranges.php`, which as mentioned above I've uploaded.
`_wp_can_use_pcre_ucp()` is a direct analogue of the existing
`_wp_can_use_pcre_u()` function which you might have missed. Again it's
not really a performance thing more for convenience of testing, as well as
being semantic and usable elsewhere.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/24661#comment:34>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list