[wp-trac] [WordPress Trac] #60698: Token Map: Introduce an efficient lookup and translation class for string mappings.

WordPress Trac noreply at wordpress.org
Mon May 13 22:41:15 UTC 2024


#60698: Token Map: Introduce an efficient lookup and translation class for string
mappings.
----------------------------------------------------+---------------------
 Reporter:  dmsnell                                 |       Owner:  (none)
     Type:  feature request                         |      Status:  new
 Priority:  normal                                  |   Milestone:  6.6
Component:  General                                 |     Version:  trunk
 Severity:  normal                                  |  Resolution:
 Keywords:  has-patch needs-unit-tests 2nd-opinion  |     Focuses:
----------------------------------------------------+---------------------

Comment (by dmsnell):

 > So there are three spots where html_entity_decode is swapped and in
 return WordPress is now responsible for maintaining an additional library?

 This isn't how I see the problem, as we wouldn't be doing any of this work
 if we had a way to avoid misparsing HTML. It's my hope that we can push
 fixes into PHP, but for now, we don't have a way to accurate decode HTML
 text, and that is leading to other challenges. This swap of three
 functions is replacing defective code with fixed code.

 We will maintain an additional library, but I do see value in it beyond
 the HTML API, which is why I am proposing it as a separate addition. Its
 use may be limited, but I was exploring Emoji replacement on WordPress.com
 and found some obvious use for the very purpose of the token map: quick
 lookup for low-overhead matching and replacement.

 > Not sure why the diff isn't clear, but it's hard to tell which tests are
 now passing in tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php
 that would have failed before but now are passing.

 The diff is unclear because the WPCS linting rules require changing the
 entire whitespace structure of the array in response to a tiny change. It
 can help to view the diff ignoring whitespace, either through GitHub or
 with `git diff -w`.

 [[Image(https://github.com/WordPress/wordpress-
 develop/assets/5431237/8f63c5c9-aec2-487e-98b0-b2b723f38982)]]

 I'll check on the new skipped test; I remember adding it but I think it's
 unrelated to the change - or possibly it was masked by previously skipped
 tests now being able to run.

 > why is `WP_HTML_Decoder` a class if nearly everything is going to be a
 public static function?

 This is all great stuff to discuss in the Text Decoder ticket #61072 and
 PR. The short answer is that I was under the impression that WordPress
 still avoids namespacing in PHP and this helps avoid bloating the global
 namespace for what amounts to a single main public method and a number of
 helper utilities.

 > There also seems to be a mix of code that is actual code that is
 intended to be used and code that is used to generate the entities map.
 Ideally code that isn't something we intend for either core or extenders
 to be used at run time isn't included at run time.

 Are you talking about the `WP_Token_Map` methods? I do think they are
 useful on their own and think that folks will find them useful. If it were
 up to me //none// of the PHP classes or modules would be loaded eagerly,
 but I've been required to do that, which I believe is for future auto-
 loading efforts.

 If we remove the `precomputed_php_source_table()` function that's not
 incredibly important to me, but we do miss out the opportunity to share
 the value Core gains from the class. It has made testing easier as well as
 updating the "cached" lookup tables because all you need is WordPress.

 I'd be open to explore other approaches you have to offer, especially if
 they retain the ability for others to get the same benefit that Core does
 (e.g. a plugin looking for a set list of words to replace, or checking if
 a stock ticker is known, etc…).

 >  I think the comments should be clearer. Why is phpcs disabled on it? Is
 there any way to avoid adding a new global?

 Can definitely update the comment. It's auto-generated in the sense that
 it comes from `precomputed_php_source_table()`. On this point I did not
 include the code meant to build the table because I didn't see it
 belonging in the runtime. In a strange way I was doing what I think you
 were asking me to do above; I think we're aligned on that notion, maybe we
 haven't fully seen the same way about how and where to fully apply the
 principle.

 `phpcs` is disabled on it because it'd considerably complicate the code
 generation to appease the linter, and then to maintain the code generation
 to update as the rules change. Even simple things like choosing single or
 double quotes has bloated my attempts to match the styling it wants, and
 that ends up confusing me whereas the code as-is is at least focused on
 the task. In addition, the file lights up in some IDEs with styling hints
 that aren't relevant in this file. It's not meant to be read, but only
 written.

 Could we avoid the global? Absolutely? What is your proposal? Mainly I
 wanted this to be fast and as low-overhead as is possible. As a global
 it's available everywhere and increases WordPress' main runtime by a few
 hundred kilobytes. I know we can pass it around as a variable and not mess
 with function calls or additional allocations, and it will be called in
 tight loops so I wanted to avoid anything dynamic that we can.

 This isn't something where I have a lot of experience. The existing
 attempts at enumerating the named character references are globals, and I
 in large part followed their steps because it seemed well-suited to the
 task.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/60698#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list