Log in

View Full Version : Markov chains for email filtering?


ggambett
01-20-2004, 06:13 AM
Yesterday I was going through the list of emails I collected from the mailing list subscription at our site, and the mails submitted by the Betty's Beer Bar demo (we ask for an email before starting the game).

Many of these mails were forged. We do some basic checking on the input before accepting it, like requiring an @, a dot after the @, and some others. However, I found many entries like sasdasd@asdasd.com or gsdfsdfg@hotmail.com. I deleted these entries manually.

But the point is, for a human being, these fake email addresses are trivial to spot. What would be the best way to automatically identify these addresses? A regexp is far too simple. I'm thinking of a markov chain model. Another possibility would be to train a neural network, but I think the best option is a markov chain model. What do you think?

Maybe I have too much free time after graduating :)

princec
01-20-2004, 06:25 AM
1. Keep a blacklist of known crap addresses which you can automatically keep updated by processing bounced mail. Add a few common ones as well like someone@microsoft.com and x@y.com.

2. Check there's a mail server for the domain by looking for MX records for it and such: take a look at http://search.cpan.org/dist/Mail-CheckUser/CheckUser.pm - a common Perl module for doing just this.

Cas :)

damocles
01-20-2004, 01:44 PM
Why exactly do you have that email request? Is it to bug people to buy it after X days? I too entered a false email because I don't like the idea of having to give out an email address just to play a demo. (I get enough spam as it is from all the illegal ****ers out there, I don't need legitimate spam too!). I'm sure many others feel the same.

Years ago when I was still relatively new to the web I would give out my email happily. Now I'm bitter and twisted and never give out email addresses. I'm sick to death of spam, and the spammers are always one step ahead of the spam blockers. My email address is now a very valuable copmmodity to me.

Karukef
01-21-2004, 04:13 AM
If it is possible, make the e-mail field optional. Ask them nicely for it if you wish, but if you force them just so that they can press "submit" you will get much fake.

Anyone that decides for themselves to give you their email are much more likely to want to get email from you as well.

I think forcing an e-mail address for a demo program shows very clearly to anyone a bit used to the web that you intend to send "offers, updates and so on".

ggambett
01-21-2004, 04:51 AM
It is optional. You can uncheck "send me updates", the OK button gets enabled, and no connection is made.

Over 20% of the addresses were fake. Anyway, that leaves me with 80% of valid addresses, which is good.

damocles
01-21-2004, 05:14 AM
Ah, there in lies your problem. The "send me updates" checkbox does not look like a way to prevent entering an email - it looks like you still want the email address but won't send updates.

You should make a "skip" button instead. It will get the message across much clearer. You need to make it obvious that this email harvesting is ONLY for sending updates, not for anything else - the internet breeds suspiscious minds.

princec
01-21-2004, 06:21 AM
I've got thousands of email addresses (and for some obscure reason, passwords) sent to me by my AF customers, that live in my server log files - I of course record every attempt to register the game, which requires a valid email address. The strange thing is, despite the instructions being painfully clear and even written in huge writing, I still get many attempts a day, and with most of the attempts, I get a proper email address that really exists, and what looks very much like the password to their pop3 accounts. Crazy. Nothing I can do about it though.

I wonder if because they've sent me their email addresses when they didn't have to that I can send them a mail back?

Cas :)