[wp-hackers] Importing HTML files as pages -- been done?

Sun Feb 8 23:23:48 GMT 2009

Dougal:

BTW, I just checked your link and found phpQuery & DomQuery which both looks very cool. Thanks for the link, I'll definitely be reviewing those in the future.

-Mike Schinkel
http://mikeschinkel.com/

----- Original Message -----
From: "Mike Schinkel" <mikeschinkel at newclarity.net>
To: wp-hackers at lists.automattic.com
Sent: Sunday, February 8, 2009 6:11:53 PM GMT -05:00 US/Canada Eastern
Subject: Re: [wp-hackers] Importing HTML files as pages -- been done?

"Dougal Campbell" <dougal at gunters.org> wrote:
> No, a DOM-based approach is definitely better than regex. 
> Regexes for parsing HTML can get *extremely* complicated, 
> and if you start trying to write a regex-based parser from 
> scratch, you'll almost certainly miss some things. 

I agree, in general.  In her specific case she said that she'd have enclosing <div>s with unique IDs identifying the content to select. That <div> would be easy to find even with strpos() and then from there a simple loop to find the applicable closing </div> would work.  Yes there are potential issues with that approach, but they would be rare.  For a general purpose tool those limitations wouldn't be acceptable but for a quick & dirty tool to accomplish a specific conversion it would be sufficient and easy.

Still, for those that feel that only a DOM approach will do I'll not stand in the way by debating it further. :-)

-Mike Schinkel
http://mikeschinkel.com/