[wp-trac] [WordPress Trac] #60698: Token Map: Introduce an efficient lookup and translation class for string mappings.
WordPress Trac
noreply at wordpress.org
Mon May 13 22:41:15 UTC 2024
#60698: Token Map: Introduce an efficient lookup and translation class for string
mappings.
----------------------------------------------------+---------------------
Reporter: dmsnell | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone: 6.6
Component: General | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch needs-unit-tests 2nd-opinion | Focuses:
----------------------------------------------------+---------------------
Comment (by dmsnell):
> So there are three spots where html_entity_decode is swapped and in
return WordPress is now responsible for maintaining an additional library?
This isn't how I see the problem, as we wouldn't be doing any of this work
if we had a way to avoid misparsing HTML. It's my hope that we can push
fixes into PHP, but for now, we don't have a way to accurate decode HTML
text, and that is leading to other challenges. This swap of three
functions is replacing defective code with fixed code.
We will maintain an additional library, but I do see value in it beyond
the HTML API, which is why I am proposing it as a separate addition. Its
use may be limited, but I was exploring Emoji replacement on WordPress.com
and found some obvious use for the very purpose of the token map: quick
lookup for low-overhead matching and replacement.
> Not sure why the diff isn't clear, but it's hard to tell which tests are
now passing in tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php
that would have failed before but now are passing.
The diff is unclear because the WPCS linting rules require changing the
entire whitespace structure of the array in response to a tiny change. It
can help to view the diff ignoring whitespace, either through GitHub or
with `git diff -w`.
[[Image(https://github.com/WordPress/wordpress-
develop/assets/5431237/8f63c5c9-aec2-487e-98b0-b2b723f38982)]]
I'll check on the new skipped test; I remember adding it but I think it's
unrelated to the change - or possibly it was masked by previously skipped
tests now being able to run.
> why is `WP_HTML_Decoder` a class if nearly everything is going to be a
public static function?
This is all great stuff to discuss in the Text Decoder ticket #61072 and
PR. The short answer is that I was under the impression that WordPress
still avoids namespacing in PHP and this helps avoid bloating the global
namespace for what amounts to a single main public method and a number of
helper utilities.
> There also seems to be a mix of code that is actual code that is
intended to be used and code that is used to generate the entities map.
Ideally code that isn't something we intend for either core or extenders
to be used at run time isn't included at run time.
Are you talking about the `WP_Token_Map` methods? I do think they are
useful on their own and think that folks will find them useful. If it were
up to me //none// of the PHP classes or modules would be loaded eagerly,
but I've been required to do that, which I believe is for future auto-
loading efforts.
If we remove the `precomputed_php_source_table()` function that's not
incredibly important to me, but we do miss out the opportunity to share
the value Core gains from the class. It has made testing easier as well as
updating the "cached" lookup tables because all you need is WordPress.
I'd be open to explore other approaches you have to offer, especially if
they retain the ability for others to get the same benefit that Core does
(e.g. a plugin looking for a set list of words to replace, or checking if
a stock ticker is known, etc…).
> I think the comments should be clearer. Why is phpcs disabled on it? Is
there any way to avoid adding a new global?
Can definitely update the comment. It's auto-generated in the sense that
it comes from `precomputed_php_source_table()`. On this point I did not
include the code meant to build the table because I didn't see it
belonging in the runtime. In a strange way I was doing what I think you
were asking me to do above; I think we're aligned on that notion, maybe we
haven't fully seen the same way about how and where to fully apply the
principle.
`phpcs` is disabled on it because it'd considerably complicate the code
generation to appease the linter, and then to maintain the code generation
to update as the rules change. Even simple things like choosing single or
double quotes has bloated my attempts to match the styling it wants, and
that ends up confusing me whereas the code as-is is at least focused on
the task. In addition, the file lights up in some IDEs with styling hints
that aren't relevant in this file. It's not meant to be read, but only
written.
Could we avoid the global? Absolutely? What is your proposal? Mainly I
wanted this to be fast and as low-overhead as is possible. As a global
it's available everywhere and increases WordPress' main runtime by a few
hundred kilobytes. I know we can pass it around as a variable and not mess
with function calls or additional allocations, and it will be called in
tight loops so I wanted to avoid anything dynamic that we can.
This isn't something where I have a lot of experience. The existing
attempts at enumerating the named character references are globals, and I
in large part followed their steps because it seemed well-suited to the
task.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/60698#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list