[wp-trac] [WordPress Trac] #24661: remove_accents is not removing combining accents

WordPress Trac noreply at wordpress.org
Mon Oct 24 03:48:43 UTC 2016


#24661: remove_accents is not removing combining accents
----------------------------------+--------------------
 Reporter:  NumidWasNotAvailable  |       Owner:
     Type:  defect (bug)          |      Status:  new
 Priority:  normal                |   Milestone:  4.7
Component:  Formatting            |     Version:  1.2.1
 Severity:  normal                |  Resolution:
 Keywords:  has-patch             |     Focuses:
----------------------------------+--------------------

Comment (by gitlost):

 Thanks for looking at this.

 `gen_cat_regex_alts.php` and `gen_script_regex_alts.php` (and
 `gen_remove_accents_ranges.php` below) were just on my local machine. As a
 temporary measure I've uploaded them (after a bit of a clean up) to the
 [https://github.com/gitlost/unfc-normalize/tree/master/tools tools
 directory of the UNFC Normalizer] plugin, as they use a functions library
 developed for it.

 I'm not sure they should be part of the build process, seeing as they
 depend on infrequently changing Unicode data. However I will clean them up
 further and separate them out so they're suitable for upload to the trunc
 tools directory if required.

 Nobody likes the separate file, so I'll upload a new patch with the regexs
 just copy-and-pasted in. It was never a performance thing, more for ease
 of generation (and to hide their ugliness). This now also argues against
 them being part of the build process though.

 The reason it's only testing Unicode 5.0.0 is due to the PCREs that are
 bundled with PHP 5.2.4 to PHP 7 being built with Unicode data ranging from
 Unicode 5.0.0 to Unicode 7.0.0, and thus giving varying results for
 character properties over their various lifetimes. I could generate
 different ranges depending on the version of PCRE being used when the
 tests are run, but I thought that was a bit OTT so went for the lowest
 common denominator.

 The tests were generated automatically, by
 `gen_remove_accents_ranges.php`, which as mentioned above I've uploaded.

 `_wp_can_use_pcre_ucp()` is a direct analogue of the existing
 `_wp_can_use_pcre_u()` function which you might have missed. Again it's
 not really a performance thing more for convenience of testing, as well as
 being semantic and usable elsewhere.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/24661#comment:34>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list