[wp-trac] [WordPress Trac] #60283: HTML API: Support all HTML tags in standard

WordPress Trac noreply at wordpress.org
Wed Jan 24 13:39:10 UTC 2024


#60283: HTML API: Support all HTML tags in standard
--------------------------------------+----------------------
 Reporter:  jonsurrell                |       Owner:  dmsnell
     Type:  enhancement               |      Status:  closed
 Priority:  normal                    |   Milestone:  6.5
Component:  HTML API                  |     Version:
 Severity:  normal                    |  Resolution:  fixed
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+----------------------

Comment (by dmsnell):

 hi @afercia 👋

 I'm glad you asked this great question.

 it's hard to start answering without a preamble, which is that "invalid,"
 "non-conforming," "non-normative,", "broken," and other words describing
 HTML must be understood as broad and highly context-sensitive terms. for
 the sake of //any// HTML parser there is no such thing as "invalid"
 markup. there //is//, but it doesn't mean anything other than "handle the
 markup as specified and note somewhere that it was invalid, if you want to
 communicate that it's invalid."

 the section of the specification you are quoting is specifically
 discussing //authoring// HTML and yes, nobody //should// be creating a new
 LISTING tag. unfortunately for us and other parsers and browsers, that
 HTML might exist and we might have to read and understand it. there's no
 way to simply ignore a deprecated, obsolete, or non-conforming tag if
 carries its own semantic rules.

 the XMP element, for example, switches into a separate parsing mode and
 the PARAM element is void. if we didn't include support for these tags
 then we would get off track when parsing any HTML that includes them.
 sadly, this means that every valid parser will have to support or at least
 recognize these elements forever in order to avoid breaking content
 already on the web. In fact, I recently observed that Github is sending
 `</xmp>` on their pages and I'm guessing it's specifically to eagerly
 mitigate any attack vector based on the invalid XMP, where if someone
 //were// to inject `<xmp>` onto the page then everything after it would be
 swallowed as plaintext until a closing `</xmp>` is found. Without that
 guard then the injection would effectively trash the page from that point
 forward.

 it's not currently possible to generate tags in the HTML API, though we do
 allow creating invalid attribute names as long as they don't break the
 HTML syntax. these discussions are relevant because the HTML API provides
 great opportunity to enforce whatever rules we want. for example, should
 we disallow setting invalid `tabindex` values? I suspect that in time
 we'll find that there are cases we want to be strict but usually we will
 want to be permissive because of the risk of over-zealously rejecting
 content that otherwise would behave well in browsers.

 for now though, the simple answer is that we //must// support any existing
 or legacy rules for deprecated elements in the spec to avoid mis-parsing
 HTML inputs containing them.

 > Also, since there are commits still referencing this ticket, I guess the
 ticket should stay open?

 I was hoping that this ticket could close and remain closed. it has a
 broad title and I'd prefer that no new PRs are opened to be associated
 with it. the existing PLAINTEXT PR is almost surely not merging into 6.5.
 this ticket, as broad as it is, is also not comprehensive and so it's not
 being discussed in a way I'd expect a big tracking ticket to work.

 what's your advice? should I rename the ticket? dissociate the already-
 associated PR from it?

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/60283#comment:30>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list