[wp-hackers] Blocking SEO robots
Daniel
malkir at gmail.com
Thu Aug 7 04:28:33 UTC 2014
Almost forgot, the link should be in a subdirectory that is marked in
robots.txt to ignore, so anything ignoring robots.txt is whats hit.
On Wed, Aug 6, 2014 at 9:26 PM, Daniel <malkir at gmail.com> wrote:
> Set up a trap. A link hidden by CSS on each page that if hit, the IP gets
> blacklisted for a period of time. No human will ever come across the link
> unless they're digging. No bot actually renders the entire page out before
> deciding what to use.
>
>
> On Wed, Aug 6, 2014 at 5:31 AM, Jeremy Clarke <jer at simianuprising.com>
> wrote:
>
>> On Wednesday, August 6, 2014, David Anderson <david at wordshell.net> wrote:
>>
>> > The issue's not about how to write blocklist rules; it's about having a
>> > reliable, maintained, categorised list of bots such that it's easy to
>> > automate the blocklist. Turning the list into .htaccess rules is the
>> easy
>> > bit; what I want to avoid is having to spend long churning through log
>> > files to obtain the source data, because it feels very much like
>> something
>> > there 'ought' to be pre-existing data out there for, given how many
>> watts
>> > the world's servers must be wasting on such bots.
>>
>>
>> The best answer is the htaccess-based blacklists from PerishablePress. I
>> think this is the latest one:
>>
>> http://perishablepress.com/5g-blacklist-2013/
>>
>> He uses a mix of blocked user agents, blocked IP's and blocked requests
>> (i.e /admin.php, intrusion scans for other software). He's been updating
>> it
>> for years and it's definitely a WP-centric project.
>>
>> In the past some good stuff has been blocked by his lists (Facebook spider
>> blocked because it had an empty user agent, common spiders used by
>> academics were blocked) but that's bound to happen and I'm sure every UA
>> was used by a spammer at some point.
>>
>> I run a ton of sites on my server so I hate the .htaccess format (which is
>> a pain to implement alongside wp+super cache rules). If I used multisite
>> it
>> would be less of a big deal. Either way, know that you can block UA's for
>> all virtual hosts if that's relevant.
>>
>> Note that ip blocking is a lot more effective at the server level because
>> blocking with Apache still uses a ton of resources (but at least no MySQL
>> etc). On Linux an iptables based block is much more effective.
>>
>>
>>
>>
>> --
>> Jeremy Clarke
>> Code and Design • globalvoicesonline.org
>> _______________________________________________
>> wp-hackers mailing list
>> wp-hackers at lists.automattic.com
>> http://lists.automattic.com/mailman/listinfo/wp-hackers
>>
>
>
>
> --
> -Dan
>
--
-Dan
More information about the wp-hackers
mailing list