Reply
msnbot/2.0b Misbehaving?
Old 02-13-2009, 12:20 PM Question msnbot/2.0b Misbehaving?
Skilled Talker

Posts: 59
Name: John
Trades: 0
Anyone else finding that the new beta MSNbot (msnbot/2.0b - or someone spoofing both agent & microsoft IP's) is not only disregarding but intentionally violating the instructions in their robots.txt? I've been getting intermittently hit for the past several weeks. To be honest the behavior suggests non-traditional reasoning for a bot so I've been monitoring it fairly closely.

Example robots.txt entry:

##### Begin
User-agent: *
Disallow: /forbidden/
##### End

Example log entries:

65.55.106.115 - [Sometime] "GET /robots.txt "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.115 - [Sametime] "GET /about.php "msnbot/2.0b"
65.55.106.172 - [Few Seconds Later] "GET /forbidden/ "msnbot/2.0b"

Disturbing since "/forbidden/" is not linked from any page on the website, referenced in the sitemap, or enabled for directory listing. In fact, it's a bot trap; only misbehaving bots should go there (or skiddies looking for a goodie). I especially like that the deviant behavior came a few seconds later from a different IP.
Envision_frodo is offline
Reply With Quote
View Public Profile
 
 
When You Register, These Ads Go Away!
Old 02-16-2009, 07:37 PM Re: msnbot/2.0b Misbehaving?
Junior Talker

Posts: 1
Trades: 0
I am working on msnbot/2.0b and the IP addresss 65.55.106.172 is from msnbot. We are investigating this issue now. From msnbot logs, we are unable to track down the affected web site. Can you help us by providing the site name so that we can narrow down the problem.
sdasan is offline
Reply With Quote
View Public Profile
 
Old 02-20-2009, 12:08 PM Re: msnbot/2.0b Misbehaving?
Skilled Talker

Posts: 59
Name: John
Trades: 0
Thanks for looking into it...

I sent the site to you along with a few of the actual entries.
Based on some initial googling it would appear this is not an isolated experience. Let me know if you need anything further?
Envision_frodo is offline
Reply With Quote
View Public Profile
 
Old 03-29-2009, 08:04 PM Re: msnbot/2.0b Misbehaving?
Junior Talker

Posts: 1
Name: Ale
Trades: 0
I found a lot of requests of msnbot/2.0b in the logs of our webserver... as it was a bit suspect, I captured the network traffic using Wireshark, and by inspecting a request of /robots.txt from msnbot/2.0b, it seems that it is misdirected (i.e., it should have be sent to another IP).

Example of such a request:

GET /robots.txt HTTP/1.1
Accept: */*
Host: www.lumigan.com
User-Agent: msnbot/2.0b (+http://search.msn.com/msnbot.htm)
Connection: Keep-Alive
Cache-Control: no-cache
Pragma: no-cache


But our server has nothing to do at all with the specified host www.lumigan.com... So I guess that this behaviour is a bug in the new msnbot...
antifumo is offline
Reply With Quote
View Public Profile
 
Old 04-15-2009, 02:35 PM Re: msnbot/2.0b Misbehaving?
Junior Talker

Posts: 1
Trades: 0
Hi,

This is Brett Yount from the Microsoft Webmaster Center team.

Thank you for catching this bug. This issue should be fixed shortly. Thank you for your patience during our msnbot 2.0 beta.
Brett Yount is offline
Reply With Quote
View Public Profile
 
Old 05-05-2009, 11:31 AM Re: msnbot/2.0b Misbehaving?
Skilled Talker

Posts: 59
Name: John
Trades: 0
Brett -

Still having issues with msnbot/2.0b disregarding the robots.txt file. Any time line on this?
Envision_frodo is offline
Reply With Quote
View Public Profile
 
Old 09-03-2009, 03:20 PM msnbot/2.0b Misbehaving? An Update
Skilled Talker

Posts: 59
Name: John
Trades: 0
It would appear that the msnbot/2.0b is still having issues, as it continues to (with increasing frequency) stumble into a forbidden/honeypot directory on my server. I'm getting a bit tired of unbanning these MSN IPs; Here's a few of the latest entries:

Code:
65.55.106.209 - - [31/Aug/2009:16:53:15 -0600] "GET /robots.txt HTTP/1.1" 200 294 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.209 - - [31/Aug/2009:16:54:03 -0600] "GET /legitpage1.php HTTP/1.0" 200 14161 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.162 - - [31/Aug/2009:18:20:12 -0600] "GET /robots.txt HTTP/1.1" 200 294 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.162 - - [31/Aug/2009:18:21:03 -0600] "GET /forbidden/ HTTP/1.0" 403 3893 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.187 - - [31/Aug/2009:18:46:27 -0600] "GET /legitdir1/ HTTP/1.0" 200 9835 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"

65.55.51.70 - - [02/Sep/2009:18:18:00 -0600] "GET /robots.txt HTTP/1.1" 200 179 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.207.95 - - [02/Sep/2009:19:06:32 -0600] "GET /robots.txt HTTP/1.1" 200 294 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.207.95 - - [02/Sep/2009:19:07:34 -0600] "GET /legitpage2.php HTTP/1.0" 200 6494 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.51.70 - - [02/Sep/2009:19:27:45 -0600] "GET /robots.txt HTTP/1.1" 200 179 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.138 - - [02/Sep/2009:20:24:36 -0600] "GET /forbidden/ HTTP/1.0" 403 3893 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.207.120 - - [02/Sep/2009:20:33:05 -0600] "GET /legitpage3.phpforbidden/ HTTP/1.0" 404 5514 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
For clarity, these logs have been modded as follows:

forbidden = Location or Honeypot (Should NOT be Spidered)
legitpage# = Legitimate Web Page (Should be Spidered)
legitdir# = Legitimate Directory/Folder (Should be Spidered)

If any other legit bots were having difficulty I might suspect a mistake on my part; however, thus far only MSN and bad boys from RUSS/CHINA have made my naughty lists.

Side Note:

A techblog (http://www.chewie.co.uk/seosem/msnbo...dex-meta-tags/) suggested that the msnbot/2.0b might be confused as to which site/IP it was spidering, and merely happened across the forbidden directory while trying to spider a different site. In this case the chance for that is incredibly slim as the directory in question was specifically titled to avoid similarity (something similar to: /reallyawesomestuffhere/). That way only the incredibly dense, or naughty, might read and follow the otherwise forbidden reference in my robots.txt file.

It's a shame some people actually like MSN/Live/Bing Search, otherwise I'd just let them ban themselves for good.
Envision_frodo is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to msnbot/2.0b Misbehaving?
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML

 



Page generated in 0.13607 seconds with 13 queries