[wp-trac] [WordPress Trac] #60227: HTML API: Add external test suite

WordPress Trac noreply at wordpress.org
Wed Jan 17 14:11:13 UTC 2024


#60227: HTML API: Add external test suite
--------------------------------------+------------------------------
 Reporter:  jonsurrell                |       Owner:  (none)
     Type:  enhancement               |      Status:  new
 Priority:  normal                    |   Milestone:  Awaiting Review
Component:  HTML API                  |     Version:
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+------------------------------

Comment (by jonsurrell):

 Replying to [comment:8 jorbin]:

 Thanks for the feedback!

 > 1. How do we keep this updated? Could this be set as an SVN external or
 be included in some other way to ensure it stays updated besides a person
 manually checking if there are new commits?

 I thought about externals, but html5lib-tests being hosted on GitHub I
 don't know how that's possible with just basic tooling.
 [https://github.blog/2023-01-20-sunsetting-subversion-support/ GitHub used
 to support subversion but recently sunset it].

 Staying up to date by pulling in changes seems ideal, but it's not
 essential. These are test cases for a standard. The standard may evolve,
 but it shouldn't evolve in a way that invalidates the tests. The commits
 over the last two years are infrequent and seem to be mostly adding some
 tests, adding some parse errors (which we ignore at this time), and some
 fixes for things like test duplication and typos.

 In short, I don't think it's problematic if we commit the current state of
 html5lib-tests and never update it. We still get a lot of value from this
 set of tests.

 > 2. Do we have any idea what the long term plans are for this project? I
 see that the majority of the projects from this github organization seem
 to be inactive or abandon.

 No I don't know. The abandoned projects were ports of html5lib-python to
 other languages. As far as I can tell html5lib-python is popular and still
 being maintained. I'm not familiar enough with the python ecosystem.

 Beautiful Soup is a project that uses html5lib that rings a bell:
 "[https://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] sits on
 top of popular Python parsers like lxml and html5lib, allowing you to try
 out different parsing strategies or trade speed for flexibility."

 I do know that [https://github.com/servo/html5ever html5ever] is a popular
 HTML5 parser in Rust (part of the [https://servo.org/ servo project]).
 html5ever uses html5lib-tests for testing.

 Again, with a relatively stable standard and stable set of tests, we can
 still get a lot of value from the tests even if they're abandoned today.

 > Additionally, I don't see any sort of code of conduct, do we know if
 this project is one who's values align with WordPress?

 I can't speak to that. As far as I can tell this is a focused software
 project without much of a community. The package is developed to provide a
 specific functionality and not much more.

 > 3. What's the reasoning besides skipping the tests that don't expect
 empty head? This is leading to 500+ skipped tests.

 The HTML API is not complete and spec compliant yet. One of the
 motivations for adding this tests is to allow us to speed up development
 by increasing confidence.

 The head tests specifically are skipped because the HTML API only works in
 "in body" parsing mode. The context of the HTML provided should be in the
 body tag for it to be supported. We skip any tests that expect to find
 anything outside the <body>

 > 4. It would be good to get the tests passing and the coding standard
 issues resolved before doing a full review

 I'll get all the code standards cleaned up. I believe I left some failing
 tests that detect existing bugs in the HTML API. Those should be addressed
 soon.

 > 5. nitpic, but the naming feels a bit cumbersome.
 `Tests_HtmlApi_WpHtmlProcessorHtml5lib::test_external_html5lib` just feels
 long.

 Agreed :) I'm not attached to the naming and happy to change it.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/60227#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list