[wp-hackers] Blocking SEO robots
David Anderson
david at wordshell.net
Wed Aug 6 09:50:52 UTC 2014
This isn't specifically a WP issue, but I think it will be relevant to
lots of us, trying to maximise our resources...
Issue: I find that a disproportionate amount of server resources are
consumed by a certain subset crawlers/robots which contribute nothing.
I'd like to just block them. I have in mind the various semi-private
search engines run by SEO companies/backlink-checkers, e.g.
http://en.seokicks.de/, https://ahrefs.com/. These things happily spider
a few thousand pages, every author, tag, category, etc., archive. Some
of them refuse to obey robots.txt (the one that specifically annoys is
when they ignore the Crawl-Delay directive. I even came across one that
proudly had a section on its website explaining that robots.txt was a
stupid idea, so they always ignored it!).
I'd like to just block such crawlers. So: does anyone know of where a
reliable list of the IP addresses used by these services is kept?
Specifically, I want to block the semi-private or obscure crawlers that
do nothing useful for my sites. I don't want to block mainstream search
engines, of course. I've done some Googling, and haven't managed to find
something that makes this distinction.
Or alternatively - anyone think this is a bad idea?
Best wishes,
David
--
UpdraftPlus - best WordPress backups - http://updraftplus.com
WordShell - WordPress fast from the CLI - http://wordshell.net
More information about the wp-hackers
mailing list