Reply
Spider Traps anyone?
Old 03-09-2008, 09:17 PM Spider Traps anyone?
aschecht's Avatar
Extreme Talker

Posts: 177
Name: A
Location: San Jose, CA
I'm thinking about incorporating something to block bad spiders, offline browsers and other automated content/bandwidth thieves.

Does anyone have experience with an easy-to-implement and maintain system they can recommend?

Thanks,

Andrew
aschecht is offline
Reply With Quote
View Public Profile Visit aschecht's homepage!
 
When You Register, These Ads Go Away!
     
Old 03-09-2008, 09:19 PM Re: Spider Traps anyone?
VirtuosiMedia's Avatar
Webmaster Talker

Posts: 683
If you use Apache, look up .htaccess...you'll get a lot of info on it.
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Old 03-09-2008, 09:47 PM Re: Spider Traps anyone?
ForrestCroce's Avatar
Half Man, Half Amazing

Posts: 3,024
Name: Forrest Croce
Location: Seattle, WA
A spider trap is usually a way to hijack bad bots. Since they like to discover new pages through hyperlinks, a lot of traps create random links or query string args and try to throw the bot into an infinite loop. Others create random email address or text for the bots to harvest, wasting their time on junk data.

Your .htaccess file will let you block robots by name / user agent string, or by IP address. Any bot with avoidance detection won't be deterred. Still, this will get you some of the basics. Sample code:

RewriteCond %{HTTP_USER_AGENT} ^curl [OR]
RewriteCond %{HTTP_USER_AGENT} ^OpenWebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget
RewriteRule ^(.*)$ http://www.robotstxt.org/
ForrestCroce is offline
Reply With Quote
View Public Profile Visit ForrestCroce's homepage!
 
Old 03-10-2008, 01:16 AM Re: Spider Traps anyone?
ADAM Web Design's Avatar
Canadastaninianite

Posts: 5,945
Name: Adam for web page design, not program
Location: Toronto, Ontario, Canada
First of all, Andrew, you need to talk to IncrediBILL. He is the unquestioned authority on all things bot in the world, IMHO...and if nothing else, he's pretty damned funny a lot of the time.

Second, I usually find the Honeypot is more effective than anything else. I use it on contact form to emails, for example...I have a customized spam checker that processes the exact same HTML output as it does for the average user, except that the spam checker adds the spam attempt to a database without the spammer's prior knowledge and I have them dead to rights. The spammer happily continues to "exploit" the system, not realizing that it's not only not getting through, but it's getting logged.

My version of it goes through 200 common regular expressions that spammers use (e.g. HTML code, "sex", "pen|s", "viagra", etc.) and, if a spam score is met based on an arbitrary formula I created, spammer gets trapped. I've found it works in 99.5% of cases with no false positives (used to be 99.7%, but some bastard spammer started using the Fax field to spam on a client's site and now I have to create an update to check the Fax field too...I hate bastards that lower my percentages.)

If you can create your own honeypot, the odds of it ever being exploited aren't very good (as you can see in the example I provided above, 200:1) and you can also customize it however you want.
ADAM Web Design is offline
Reply With Quote
View Public Profile Visit ADAM Web Design's homepage!
 
Old 03-10-2008, 08:14 AM Re: Spider Traps anyone?
Capt Quirk's Avatar
Extreme Talker

Posts: 202
Location: Flordidian
Is there a way to set the trap, so that they actually spam the FBI instead of you?
Capt Quirk is offline
Reply With Quote
View Public Profile
 
Old 03-10-2008, 07:28 PM Re: Spider Traps anyone?
dansgalaxy's Avatar
Eat, Sleep, Code

Latest Blog Post:
Offending facebook ads… so?
Posts: 6,092
Name: Dan
Location: Swindon
I would point out that it would be likely that the FBI would track it back to YOU and u might get in trouble for spamming,,,
__________________
Personal UK Webhosting
Get 25% of ANY shared package for life ~ Promo: webmaster-talk (only for members!)
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 03-10-2008, 07:48 PM Re: Spider Traps anyone?
Capt Quirk's Avatar
Extreme Talker

Posts: 202
Location: Flordidian
Quote:
Originally Posted by dansgalaxy View Post
I would point out that it would be likely that the FBI would track it back to YOU and u might get in trouble for spamming,,,
I hadn't thought about that. Better use an alias... just start calling me Dan...
Capt Quirk is offline
Reply With Quote
View Public Profile
 
Old 03-11-2008, 02:26 AM Re: Spider Traps anyone?
aschecht's Avatar
Extreme Talker

Posts: 177
Name: A
Location: San Jose, CA
My intention for a spider trap doesn't involve avoiding spam. It's just to block offline browsers / web clippers.

My understanding of a spider trap is as follows:
  • The Robots.txt file specifies links/files that shouldn't be spidered.
  • Bad spiders will ignore this in the robots.txt file
  • A link is created (perhaps an invisible link) that ordinary users wouldn't see or follow but a spider would.
  • Once followed, the bad spider has triggered the trap and somehow the .htaccess file is updated to block access to that spider/user agent.
  • One thing I read about this actually did a URL redirect of the spider pointing it to spampoison.com (which feeds it an infinite amount of garbage e-mail addresses wasting its harvesting efforts). If you had particularly big cahones, Quirk I guess you could instead redirect the spider to the FBI. (P.S. FBI, if you're listening - and we all know you are - please note that I'm not advising that anyone send their bad spiders to you.)
A lot of what I've come across looks similar to what Forrest posted - an .htaccess file with many, many lines with known bad spiders that would be blocked. It seems like this would be good but would be insufficient to keep up with the evolving threat. The trap idea appeals. I imagine one wouldn't want to trap a bad spider on ones own site (ie. infinite loops, etc) because then they'd be tying up your own server and using your own bandwidth. The idea of dumping them somewhere else that will keep them occupied sounds good.

Adam - I just like the word Honeypot. I also like that its thought to originate from Winnie the Pooh. Too high tech, Mission Impossible for me though.

IncrediBill seems like an interesting collection of bad IP addresses and stories about pizza vomit. I need to dig in further.

This page seems to have a thorough list of user-agents to exclude in the .htaccess file:
http://jetfar.com/trap-content-scrap...ress-htaccess/

I think this post is too long. Time to cut myself off.

A.
aschecht is offline
Reply With Quote
View Public Profile Visit aschecht's homepage!
 
Old 03-11-2008, 03:00 PM Re: Spider Traps anyone?
Capt Quirk's Avatar
Extreme Talker

Posts: 202
Location: Flordidian
My cajones aren't in question, it's my technical skills that limit what I do. I just figured that the FBI might actually be motivated to track down the spammers, not me. They know where I am anyways.
Capt Quirk is offline
Reply With Quote
View Public Profile
 
Old 03-11-2008, 04:34 PM Re: Spider Traps anyone?
dansgalaxy's Avatar
Eat, Sleep, Code

Latest Blog Post:
Offending facebook ads… so?
Posts: 6,092
Name: Dan
Location: Swindon
i doubt the FBI gives a flying fig about petty spammers.

now try to hack the FBI (and get somewhere) THEN they might get interested...
__________________
Personal UK Webhosting
Get 25% of ANY shared package for life ~ Promo: webmaster-talk (only for members!)
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Reply     « Reply to Spider Traps anyone?
 

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML

 


Page generated in 0.67649 seconds with 13 queries