[wp-hackers] pulling a massive HTML site into Wordpress

Dion Hulse (dd32) wordpress at dd32.id.au
Tue Jun 7 01:09:54 UTC 2011

On 6 June 2011 23:02, John Black <immanence7 at gmail.com> wrote:

> On 6 Jun 2011, at 16:50, Dion Hulse (dd32) wrote:
> >> I see there are some plugins to handle 301 redirects. But these tend to
> be
> >> for a handful of files, not 50,000. Any thoughts on how this would
> managed?
> >>
> >
> > I'd be storing a meta of their original file location when inserting
> them,
> > That way you can add a filter later to the 404/canonical handlers to
> check
> > the url against the meta fields to find the old document, and issue the
> 301.
> > Or, you could store the meta, retrieve it later to create a massive
> redirect
> > list, and feed that into .htaccess or similar.

How would you generate the meta? Some of the more recent HTML files have a
> note of the URL of the file embedded. But a quick check shows that the older
> files (as I say, the archive goes back to 1998) don't.
I wouldnt store the full url, rather, the url of that particular page. I
assumed the files you have, are in the same structure as the live files? If
so, I'd store (for example) /2008/directory_here/filename.html as the meta.

> I was hoping to do the migration on a localhost install. To get the meta
> would I have to do the migration on the actual server of this organization?

Not at all, Just only store the part of the url which matters.
Of course, If the files are live on the web in a different format/url
structure, you would need a way of mapping the live structure to the archive
files you have.

> best,
> JB
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers

More information about the wp-hackers mailing list