[wp-trac] [WordPress Trac] #62091: XML API: Produce XML Serialization of HTML (XHTML)
WordPress Trac
noreply at wordpress.org
Sat Sep 21 00:37:41 UTC 2024
#62091: XML API: Produce XML Serialization of HTML (XHTML)
-----------------------------+----------------------------
Reporter: dmsnell | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone: Future Release
Component: HTML API | Version: trunk
Severity: normal | Keywords:
Focuses: |
-----------------------------+----------------------------
Even though XML cannot represent all possible HTML documents, and even
though it's dangerous to send XHTML content generally, there are extremely
rare cases where it's useful to directly embed an HTML document into an
existing XML document, if the given document //can// be expressed in XML.
=== What is required to transform when converting HTML to XML? ===
* HTML void elements like `<img>` should adopt the self-closing flag to
become `<img />`
* HTML text should be decoded and then only `<`, `>`, `&`, `"`, and `'`
ought to be re-encoded.
* Namespace transitions should involve changes to the default namespace.
* When entering a foreign element (`SVG` and `MATH`).
* When returning to HTML from a foreign element.
* When entering HTML integration points, such as `FOREIGNOBJECT` and
`ANNOTATION-XML` with the proper attribute.
* Something has to be done about un-representable characters.
* Invalid UTF-8 bytes.
* Unicode non-characters and other disallowed characters.
* HTML documents which cannot be represented in XML should result in
rejection - cannot serialize.
* The HTML doctype declaration probably needs to be removed.
=== Design ===
With the introduction of `WP_HTML_Processor::serialize()` in #62036, an
XML serialization might appear naturally as
`WP_HTML_Processor::serialize_to_xml()`. When parsing as a fragment, the
output may be an XML fragment, while a full parser would produce a valid
XHTML document including the XML declaration.
----
Please share your thoughts if you know of other transformations that need
to occur.
----
XML and HTML are divergent languages. You probably don't want XHTML. It's
dangerous.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62091>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list