[wp-hackers] Blocking SEO robots
Daniel Fenn
danielx386 at gmail.com
Thu Aug 7 04:31:16 UTC 2014
I like to use a nice tool from http://www.spambotsecurity.com/ but it
may cause more issues for some people though. Best thing is that it
very fast and dowsn't slow down unlike .htaccess
Regards,
Daniel Fenn
On Thu, Aug 7, 2014 at 2:28 PM, Daniel <malkir at gmail.com> wrote:
> Almost forgot, the link should be in a subdirectory that is marked in
> robots.txt to ignore, so anything ignoring robots.txt is whats hit.
>
>
> On Wed, Aug 6, 2014 at 9:26 PM, Daniel <malkir at gmail.com> wrote:
>
>> Set up a trap. A link hidden by CSS on each page that if hit, the IP gets
>> blacklisted for a period of time. No human will ever come across the link
>> unless they're digging. No bot actually renders the entire page out before
>> deciding what to use.
>>
>>
>> On Wed, Aug 6, 2014 at 5:31 AM, Jeremy Clarke <jer at simianuprising.com>
>> wrote:
>>
>>> On Wednesday, August 6, 2014, David Anderson <david at wordshell.net> wrote:
>>>
>>> > The issue's not about how to write blocklist rules; it's about having a
>>> > reliable, maintained, categorised list of bots such that it's easy to
>>> > automate the blocklist. Turning the list into .htaccess rules is the
>>> easy
>>> > bit; what I want to avoid is having to spend long churning through log
>>> > files to obtain the source data, because it feels very much like
>>> something
>>> > there 'ought' to be pre-existing data out there for, given how many
>>> watts
>>> > the world's servers must be wasting on such bots.
>>>
>>>
>>> The best answer is the htaccess-based blacklists from PerishablePress. I
>>> think this is the latest one:
>>>
>>> http://perishablepress.com/5g-blacklist-2013/
>>>
>>> He uses a mix of blocked user agents, blocked IP's and blocked requests
>>> (i.e /admin.php, intrusion scans for other software). He's been updating
>>> it
>>> for years and it's definitely a WP-centric project.
>>>
>>> In the past some good stuff has been blocked by his lists (Facebook spider
>>> blocked because it had an empty user agent, common spiders used by
>>> academics were blocked) but that's bound to happen and I'm sure every UA
>>> was used by a spammer at some point.
>>>
>>> I run a ton of sites on my server so I hate the .htaccess format (which is
>>> a pain to implement alongside wp+super cache rules). If I used multisite
>>> it
>>> would be less of a big deal. Either way, know that you can block UA's for
>>> all virtual hosts if that's relevant.
>>>
>>> Note that ip blocking is a lot more effective at the server level because
>>> blocking with Apache still uses a ton of resources (but at least no MySQL
>>> etc). On Linux an iptables based block is much more effective.
>>>
>>>
>>>
>>>
>>> --
>>> Jeremy Clarke
>>> Code and Design • globalvoicesonline.org
>>> _______________________________________________
>>> wp-hackers mailing list
>>> wp-hackers at lists.automattic.com
>>> http://lists.automattic.com/mailman/listinfo/wp-hackers
>>>
>>
>>
>>
>> --
>> -Dan
>>
>
>
>
> --
> -Dan
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
More information about the wp-hackers
mailing list