[wp-trac] [WordPress Trac] #62091: XML API: Produce XML Serialization of HTML (XHTML)

WordPress Trac noreply at wordpress.org
Sat Sep 21 00:37:41 UTC 2024


#62091: XML API: Produce XML Serialization of HTML (XHTML)
-----------------------------+----------------------------
 Reporter:  dmsnell          |      Owner:  (none)
     Type:  feature request  |     Status:  new
 Priority:  normal           |  Milestone:  Future Release
Component:  HTML API         |    Version:  trunk
 Severity:  normal           |   Keywords:
  Focuses:                   |
-----------------------------+----------------------------
 Even though XML cannot represent all possible HTML documents, and even
 though it's dangerous to send XHTML content generally, there are extremely
 rare cases where it's useful to directly embed an HTML document into an
 existing XML document, if the given document //can// be expressed in XML.

 === What is required to transform when converting HTML to XML? ===

  * HTML void elements like `<img>` should adopt the self-closing flag to
 become `<img />`
  * HTML text should be decoded and then only `<`, `>`, `&`, `"`, and `'`
 ought to be re-encoded.
  * Namespace transitions should involve changes to the default namespace.
     * When entering a foreign element (`SVG` and `MATH`).
     * When returning to HTML from a foreign element.
     * When entering HTML integration points, such as `FOREIGNOBJECT` and
 `ANNOTATION-XML` with the proper attribute.
  * Something has to be done about un-representable characters.
     * Invalid UTF-8 bytes.
     * Unicode non-characters and other disallowed characters.
  * HTML documents which cannot be represented in XML should result in
 rejection - cannot serialize.
  * The HTML doctype declaration probably needs to be removed.

 === Design ===

 With the introduction of `WP_HTML_Processor::serialize()` in #62036, an
 XML serialization might appear naturally as
 `WP_HTML_Processor::serialize_to_xml()`. When parsing as a fragment, the
 output may be an XML fragment, while a full parser would produce a valid
 XHTML document including the XML declaration.

 ----

 Please share your thoughts if you know of other transformations that need
 to occur.

 ----

 XML and HTML are divergent languages. You probably don't want XHTML. It's
 dangerous.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62091>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list