[wp-trac] [WordPress Trac] #60283: HTML API: Support all HTML tags in standard
WordPress Trac
noreply at wordpress.org
Wed Jan 24 13:39:10 UTC 2024
#60283: HTML API: Support all HTML tags in standard
--------------------------------------+----------------------
Reporter: jonsurrell | Owner: dmsnell
Type: enhancement | Status: closed
Priority: normal | Milestone: 6.5
Component: HTML API | Version:
Severity: normal | Resolution: fixed
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+----------------------
Comment (by dmsnell):
hi @afercia 👋
I'm glad you asked this great question.
it's hard to start answering without a preamble, which is that "invalid,"
"non-conforming," "non-normative,", "broken," and other words describing
HTML must be understood as broad and highly context-sensitive terms. for
the sake of //any// HTML parser there is no such thing as "invalid"
markup. there //is//, but it doesn't mean anything other than "handle the
markup as specified and note somewhere that it was invalid, if you want to
communicate that it's invalid."
the section of the specification you are quoting is specifically
discussing //authoring// HTML and yes, nobody //should// be creating a new
LISTING tag. unfortunately for us and other parsers and browsers, that
HTML might exist and we might have to read and understand it. there's no
way to simply ignore a deprecated, obsolete, or non-conforming tag if
carries its own semantic rules.
the XMP element, for example, switches into a separate parsing mode and
the PARAM element is void. if we didn't include support for these tags
then we would get off track when parsing any HTML that includes them.
sadly, this means that every valid parser will have to support or at least
recognize these elements forever in order to avoid breaking content
already on the web. In fact, I recently observed that Github is sending
`</xmp>` on their pages and I'm guessing it's specifically to eagerly
mitigate any attack vector based on the invalid XMP, where if someone
//were// to inject `<xmp>` onto the page then everything after it would be
swallowed as plaintext until a closing `</xmp>` is found. Without that
guard then the injection would effectively trash the page from that point
forward.
it's not currently possible to generate tags in the HTML API, though we do
allow creating invalid attribute names as long as they don't break the
HTML syntax. these discussions are relevant because the HTML API provides
great opportunity to enforce whatever rules we want. for example, should
we disallow setting invalid `tabindex` values? I suspect that in time
we'll find that there are cases we want to be strict but usually we will
want to be permissive because of the risk of over-zealously rejecting
content that otherwise would behave well in browsers.
for now though, the simple answer is that we //must// support any existing
or legacy rules for deprecated elements in the spec to avoid mis-parsing
HTML inputs containing them.
> Also, since there are commits still referencing this ticket, I guess the
ticket should stay open?
I was hoping that this ticket could close and remain closed. it has a
broad title and I'd prefer that no new PRs are opened to be associated
with it. the existing PLAINTEXT PR is almost surely not merging into 6.5.
this ticket, as broad as it is, is also not comprehensive and so it's not
being discussed in a way I'd expect a big tracking ticket to work.
what's your advice? should I rename the ticket? dissociate the already-
associated PR from it?
--
Ticket URL: <https://core.trac.wordpress.org/ticket/60283#comment:30>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list