[wp-trac] [WordPress Trac] #58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag Processor

WordPress Trac noreply at wordpress.org
Fri Sep 22 21:52:32 UTC 2023


#58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag
Processor
----------------------------------------+------------------------------
 Reporter:  dmsnell                     |       Owner:  Bernhard Reiter
     Type:  enhancement                 |      Status:  reopened
 Priority:  normal                      |   Milestone:  6.4
Component:  HTML API                    |     Version:
 Severity:  normal                      |  Resolution:
 Keywords:  has-unit-tests 2nd-opinion  |     Focuses:
----------------------------------------+------------------------------

Comment (by dmsnell):

 > it is impossible to account for all edge cases. This will always be
 inferior to applying the same changes from JS using the browser's DOM. The
 difference being that PHP has no idea what the DOM will look like once the
 browser parses the HTML.

 come on @azaozz 🙃 I thought we'd gotten through this - these comments are
 factually inaccurate. I really do hear you, but repeating this mantra
 doesn't help anyone or provide any actionable steps. it's absolutely
 possible to do all these things because HTML5 standardized everything that
 governs how the browsers and all other parsers should handle malformed
 HTML.

 the entire purpose of the HTML API is to stop treating HTML as a string in
 PHP, because it provides a structural/semantic interface for doing that.
 you're barking against the one thing that's delivering what you've been
 wanting for years.

 that being said I have no problem with your opinion to run code on the
 browser vs. the server, but this isn't the place to hold that discussion
 is it? "don't provide a safe structural HTML interface on the server
 because //I personally// prefer people do it in the browser?" you're
 advocating that we keep the status quo of treating HTML as a string on the
 server, advocating for keeping around the norms that introduce all the
 vulnerabilities and breakages you are speaking against.

 > I believe all "HTML manipulations" should be done using the browser's
 DOM

 I would enjoy reviewing your patch to remove `wp_kses` 😀

 anyway, my point was to try and be clear for the sake of @oglekler - this
 is not the place to hash out a philosophy of doing everything in the
 customer's browser vs. on the server. that cannot be resolved here, plus
 there are already competing CMS's that don't run PHP.

 this is a practical matter about whether we want to eliminate security
 vulnerabilities and content breakage by providing a spec-compliant HTML
 parser in Core. this system eliminates knowingly-naive and faulty string
 replacements and regular expressions that constantly cause grief and
 damage WordPress' reputation. as long as people are processing HTML on the
 server I'd rather have a reliable system than a fundamentally-flawed one.

 can we find a different place to host the philosophical argument rather
 than on every patch that improves the situation?

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/58517#comment:22>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list