[wp-hackers] Importing HTML files as pages -- been done?
Mike Schinkel
mikeschinkel at newclarity.net
Sun Feb 8 23:11:53 GMT 2009
"Dougal Campbell" <dougal at gunters.org> wrote:
> No, a DOM-based approach is definitely better than regex.
> Regexes for parsing HTML can get *extremely* complicated,
> and if you start trying to write a regex-based parser from
> scratch, you'll almost certainly miss some things.
I agree, in general. In her specific case she said that she'd have enclosing <div>s with unique IDs identifying the content to select. That <div> would be easy to find even with strpos() and then from there a simple loop to find the applicable closing </div> would work. Yes there are potential issues with that approach, but they would be rare. For a general purpose tool those limitations wouldn't be acceptable but for a quick & dirty tool to accomplish a specific conversion it would be sufficient and easy.
Still, for those that feel that only a DOM approach will do I'll not stand in the way by debating it further. :-)
-Mike Schinkel
http://mikeschinkel.com/
More information about the wp-hackers
mailing list