If you want to start researching e-mail spam, and need to start collecting samples, here’s some information to start your own spam trap.
Infosec colleague Gianluca Stringhini was tweeting today, looking for information on starting a spam trap. Having worked for an anti-spam company for over six years, having a patent on several processes for spam trending analysis, and being the author of the Wikipedia article that defines a spamtrap, I thought I’d write a blog article on spam traps while it was fresh on my mind.
The spam trap
A spam trap is usually an e-mail address that is created not for communication, but rather to lure spam. In order to prevent legitimate email from being invited, the e-mail address will typically only be published in a location hidden from view such that an automated e-mail address harvester (used by spammers) can find the email address, but no sender would be encouraged to send messages to the email address for any legitimate purpose.
For this purpose, we’re defining spam as unsolicited bulk e-mail, or UBE. We define spammers as those who collect e-mail addresses for the purpose of sending them UBE.
The observer effect
The key to doing good spam research is monitoring the spam in the wild. The process of monitoring an experiment will alter the results, but the key is to minimize this phenomena. The best way to do this is to keep the trap hidden and secret.
The discovery of a trap can permanently taint your results. You don’t want spammers, or even other researchers, to know that your trap exists or its location because then they can manipulate your research by planting samples in your trap. Additionally, you want to avoid creating any reasons for a legitimate e-mail message to land in your trap.
There is no good way, that I know of to determine if someone has discovered your spam trap, but if your spam trap e-mail address gets subscribed to some legitimate mailing lists, you can pretty much call it compromised.
There are some trends in harvesting and spam delivery that will help us get our e-mail address harvested by spammers and help get us spam samples earlier than the general public.
- Spammers tend to harvest e-mail addresses from WHOIS records of new domain names.
- Spammers tend to harvest e-mail addresses on Web sites in mailto HTML tags easier than others.
- Spammers tend to send their spam campaigns in alphabetical order by domain and e-mail address.
- Spammers tend to use unsubscribe requests to make verified lists to sell to other spammers.
Tips to start a trap
Register a domain name that starts with the letter “a”, preferably “aa” or “a1”. Make it a senseless word that no one would possibly have any reason to be interested in as a domain name, so that you don’t receive any legitimate offers to buy the domain name to your whois record. For the rest of this article, I’ll be using the Internet standards specified “example.com” domain.
When you register the domain name, use “firstname.lastname@example.org” in the whois contacts. This will allow you to track e-mail being harvested by whois scrapers. Of course you could use a different name, just as long as you remember what you use.
Throw up an under construction page at “http://www.example.com/” containing an e-mail address in a mailto link. The e-mail address should be dynamically generated based on the visitor’s IP address. Put the visitors IP address, but start them with the letter “a” preferrably. For example, “email@example.com”. I’d actually recommend encrypting the IP address with blowfish or something, but you get the point. The key is to be able to track e-mail sent to that e-mail address back to the web server log entry of when it got harvested. In theory this can taint results, because a spammer may first scan your web site from one machine in a probe, before sending in the actual harvester from a different IP address.
Make some Usenet posts containing only the spam trap e-mail address in the headers, or some random text that wouldn’t demand a reply by anyone actually reading the forum. Use a different e-mail address at your domain for each Usenet group you post to. I’d recommend encrypting the group name into the e-mail address, or preferably storing the group name, date and time you posted, in a database along with each e-mail address. I’d recommend targeting groups where it would be very odd for people to respond to non-sensical or empty posts.
Find recent spam that have unsubscribe forms or unsubscribe links in them, and plug your spam trap e-mail address into them. I’d recommend keeping a copy of each spam that you do this with and using a unique e-mail address for each one, so again you can track the source of the spam being sent to the trap. Keep the original spam, the date and time the unsubscribe was submitted, along with each unique e-mail address.
If you do a good job keeping your spam trap hidden and secret, you’re still likely to end up with some legitimate e-mail in your trap. This can be bad if you’re assuming that any e-mail sent to the trap is spam and doing some type of automatic filter creation.
Eventually a spammer is going to forge and send spam from your e-mail address or domain, causing auto-replies or bounces to end up in your spam trap. It’s important to tag e-mail messages that look like auto-replies and bounces for investigation, and definitely don’t do any automatic filter creation with this subset of messages.
What not to do
No matter how much you’re tempted, do not subscribe your e-mail spam trap addresses to “spam”. It’s not possible, because if one subscribes to it, it’s not spam. What you could do, however, is research unsubscribe practices and and selling of e-mail addresses between e-mail marketing companies.
Don’t take a shortcut by just monitoring e-mail going to a domain that is already receiving a lot of unwanted e-mail. You can’t reliably determine the difference between unwanted e-mail and spam. The users of the domain may have subscribed to many messages that might otherwise look like spam if they hadn’t subscribed years earlier by filling out some survey or entering some sweepstakes. Additionally, you will have had no control over how the spam flow started and have very little data for analysis.
More research fun
- Register different domains that start with different letters in the alphabet, and do the same thing with them to see the interesting sorting trends.
- Try clicking on links or visiting image tracking links sent to some of the addresses to see if it affects traffic to that address
- Try unsubscribing from spam to see if it affects traffic to that address
- Try accepting e-mail sent to any address (a wild card username) at your domain to research directory harvesting attacks
- Try rejecting e-mail sent to half of the invalid addresses at your domain to research directory harvesting attacks
- Try listing the e-mail addresses in different ways on the under construction Web site to see which are harvested more often. Even test formats like “doug [at] example.com” to see how many spam harvesters are actually pretty smart
- Do some a/b testing with “http://www.example.com/a/” and “http://www.example.com/b/” and submit one to google and not the other to see the interesting results of whether spammers use search engines
- There are an endless number of a/b testing scenarios that you can do, but plan some out in advance
In general, have fun, but you’ll need to be patient or start early. It can take a long time to start catching a decent number of e-mail spam samples in your traps.