[wp-hackers] Importing HTML files as pages -- been done?

Beau Lebens beau at dentedreality.com.au
Fri Feb 6 18:32:54 GMT 2009


> I think the DOM parser in php 4 was nasty, but the DOM parser for php 5
> could be "bent" to work if you go that far. If your lucky and are sure there
> is some standard element to key from some grep parser would probably the
> easiest.

Assuming this was non-core (and thus has the flexibility to rely on
extensions that's aren't standard fare), you could use
http://us2.php.net/tidy to get a cleaned/tidied file, which you can
then manipulate as XML. I haven't used the PHP version, but I'd used
Tidy before and it's pretty slick. You can get some strange nesting of
elements sometimes if your source HTML is really scrappy though.


More information about the wp-hackers mailing list