[wp-hackers] XHTML Strict Mode

Jamie Talbot wphackers at jamietalbot.com
Thu Aug 12 10:56:00 UTC 2004


Ok,

I've got a preliminary solution, which makes a better job of closing tags based
on relationships derived from the XHTML 1.0 Strict DTD.  It's pretty dirty at
the moment, full of comments, echos and debug info!  It parsed an 8k string in
about 0.2 seconds though, so I guess time-wise it won't be a problem.

You can find the file at:

http://www.jamietalbot.com/wp-hacks/xvalid.phps

At the moment, there are no warnings - it just goes plowing on, reordering
things as is deemed necessary  :D  This will obviously have to change.
Probably a set of options on just how automatic the process should be?

One of the more debatable things it does is remove tags entirely that aren't in
the specification.  Obviously this should not be a default setting, as a typo
could cause a tag pair to be lost.  This would be a good candidate for simply
warning the user.

The file contains a lot of test cases.  Best way to test it is on a local
server, uncommenting tests one by one to see how it deals with each.  Looking
at the HTML source at the end is the best way I've found.

The things it should do properly:

Never put a tag inside a tag that it can't legally be inside.  If there are
errors in this, the fix is simple as it only involves editing the top arrays.

Correct unclosed empty tags:  ie <br> or </br> to <br />
Remove orphan tags (closing tags that were never opened).
Deal with greater than signs.
Lowercases all the tags it encounters (but not the attributes).
Attempts to merge nested duplicate tags:

ie <b><i>help</i><b>me</b></b> to <b><i>help</i>me</b>

The things it doesn't do yet:

Remove tag pairs that have no contents: ie <p></p>.
Deal with attributes at all.  That would be a nightmare!

The things it will (hopefully!) do later:

Automatically surround non list elements inside <ul>, <ol> with <li> tags.
Deal with tags that are self closed by mistake ie: <a href="..." />

My tests are by no means exhaustive.  Again, comments and bug reports
appreciated!  I expect there will be a fair few of the latter :D

Jamie.

--
http://www.jamietalbot.com/


Quoting Brian Meidell <brian at mindflow.dk>:

>
> Jamie Talbot wrote:
>
> >Basically, closing each tag as late as is legally possible.
> >
> >
> I would go for the same solution.
>
> If markup doesn't just pass with flying colors, it might be a good idea
> to ask the author to confirm the result before publishing it.
>
> /Brian
>
> _______________________________________________
> hackers mailing list
> hackers at wordpress.org
> http://wordpress.org/mailman/listinfo/hackers_wordpress.org
>




More information about the hackers mailing list