[wp-hackers] "save it silently for later crunching"

Mark Jaquith mark.wordpress at txfx.net
Thu Feb 24 21:07:14 GMT 2005


Owen Winkler wrote:

> The nice thing about storing the spam in the database is it gives you 
> a chance to retrieve the original comment from the trash bin should it 
> turn out not to be spam.
>
> A simple plugin would send a digest of the last X spam titles so that 
> you could retrieve the false positives from the ether, and cull old 
> spams from the database that were never approved.  It should probably 
> also verify incoming comment IPs against ones already marked as spam - 
> say, within the last hour - and hold them over as spam, too.
>
> Yeah, that's a good idea.  I wonder if anyone has done that yet.
>
> /me whistles innocently.

It wouldn't be too hard to modify Spam Karma to do that.  Spam Karma 
already e-mails you digests... all it has to do now is save the comment 
as spam instead of discarding it.  I'm likely going to have something to 
do with Spam Karma 2.0, so that'll probably make it in.

Then the question is: what do we do with the information?  We could do 
some sort of bayesian magic on the URIs (too many spam comments have 
junk content, so that would throw things off.)  We could use this info 
to dynamically create a blacklist.  If you have a spam from 
"blackjack-casino.com" and then one from "casino-poker.com," you could 
probably guess that "blackjack-poker.com" is going to be spam as well.

I'm sure once there is enough spam to analyze, we'll start to see 
patterns we never even considered before.  What might be useful is a 
"comment export" plugin that could export your comments (both good ones 
and spam) for analysis by plugin writers.



More information about the hackers mailing list