[wp-hackers] "save it silently for later crunching"
Mark Jaquith
mark.wordpress at txfx.net
Thu Feb 24 21:07:14 GMT 2005
Owen Winkler wrote:
> The nice thing about storing the spam in the database is it gives you
> a chance to retrieve the original comment from the trash bin should it
> turn out not to be spam.
>
> A simple plugin would send a digest of the last X spam titles so that
> you could retrieve the false positives from the ether, and cull old
> spams from the database that were never approved. It should probably
> also verify incoming comment IPs against ones already marked as spam -
> say, within the last hour - and hold them over as spam, too.
>
> Yeah, that's a good idea. I wonder if anyone has done that yet.
>
> /me whistles innocently.
It wouldn't be too hard to modify Spam Karma to do that. Spam Karma
already e-mails you digests... all it has to do now is save the comment
as spam instead of discarding it. I'm likely going to have something to
do with Spam Karma 2.0, so that'll probably make it in.
Then the question is: what do we do with the information? We could do
some sort of bayesian magic on the URIs (too many spam comments have
junk content, so that would throw things off.) We could use this info
to dynamically create a blacklist. If you have a spam from
"blackjack-casino.com" and then one from "casino-poker.com," you could
probably guess that "blackjack-poker.com" is going to be spam as well.
I'm sure once there is enough spam to analyze, we'll start to see
patterns we never even considered before. What might be useful is a
"comment export" plugin that could export your comments (both good ones
and spam) for analysis by plugin writers.
More information about the hackers
mailing list