[wp-hackers] XHTML Strict Mode

Jamie Talbot wphackers at jamietalbot.com
Thu Aug 12 11:26:17 UTC 2004


Cheers Mark,

Will take a look!

Jamie.

--
http://www.jamietalbot.com/


Quoting Mark Jaquith <mark.wordpress at txfx.net>:

> Funny... not 2 hours ago I was looking around for similar code for
> another project, and I found something that might be quite useful here.
> http://simon.incutio.com/code/php/SafeHtmlChecker.class.php.txt
>
> It handles a lot of different cases, such as a block element used inside
> an inline element, as well as improper attributes.  It was originally
> designed to parse comments (with the obvious security restrictions), but
> it could be modified to do just about anything.
>
> Jamie Talbot wrote:
>
> >Ok,
> >
> >I've got a preliminary solution, which makes a better job of closing tags
> based
> >on relationships derived from the XHTML 1.0 Strict DTD.  It's pretty dirty
> at
> >the moment, full of comments, echos and debug info!  It parsed an 8k string
> in
> >about 0.2 seconds though, so I guess time-wise it won't be a problem.
> >
> >You can find the file at:
> >
> >http://www.jamietalbot.com/wp-hacks/xvalid.phps
> >
> >At the moment, there are no warnings - it just goes plowing on, reordering
> >things as is deemed necessary  :D  This will obviously have to change.
> >Probably a set of options on just how automatic the process should be?
> >
> >One of the more debatable things it does is remove tags entirely that aren't
> in
> >the specification.  Obviously this should not be a default setting, as a
> typo
> >could cause a tag pair to be lost.  This would be a good candidate for
> simply
> >warning the user.
> >
> >The file contains a lot of test cases.  Best way to test it is on a local
> >server, uncommenting tests one by one to see how it deals with each.
> Looking
> >at the HTML source at the end is the best way I've found.
> >
> >The things it should do properly:
> >
> >Never put a tag inside a tag that it can't legally be inside.  If there are
> >errors in this, the fix is simple as it only involves editing the top
> arrays.
> >
> >Correct unclosed empty tags:  ie <br> or </br> to <br />
> >Remove orphan tags (closing tags that were never opened).
> >Deal with greater than signs.
> >Lowercases all the tags it encounters (but not the attributes).
> >Attempts to merge nested duplicate tags:
> >
> >ie <b><i>help</i><b>me</b></b> to <b><i>help</i>me</b>
> >
> >The things it doesn't do yet:
> >
> >Remove tag pairs that have no contents: ie <p></p>.
> >Deal with attributes at all.  That would be a nightmare!
> >
> >The things it will (hopefully!) do later:
> >
> >Automatically surround non list elements inside <ul>, <ol> with <li> tags.
> >Deal with tags that are self closed by mistake ie: <a href="..." />
> >
> >My tests are by no means exhaustive.  Again, comments and bug reports
> >appreciated!  I expect there will be a fair few of the latter :D
> >
> >Jamie.
> >
> >--
> >http://www.jamietalbot.com/
> >
> >
> >Quoting Brian Meidell <brian at mindflow.dk>:
> >
> >
> >
> >>Jamie Talbot wrote:
> >>
> >>
> >>
> >>>Basically, closing each tag as late as is legally possible.
> >>>
> >>>
> >>>
> >>>
> >>I would go for the same solution.
> >>
> >>If markup doesn't just pass with flying colors, it might be a good idea
> >>to ask the author to confirm the result before publishing it.
> >>
> >>/Brian
> >>
> >>_______________________________________________
> >>hackers mailing list
> >>hackers at wordpress.org
> >>http://wordpress.org/mailman/listinfo/hackers_wordpress.org
> >>
> >>
> >>
> >
> >
> >_______________________________________________
> >hackers mailing list
> >hackers at wordpress.org
> >http://wordpress.org/mailman/listinfo/hackers_wordpress.org
> >
> >
> >
> >
>




More information about the hackers mailing list