My .htaccess file doesn't want to work anymore.
05-29-2007, 03:06 PM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 3,024
Name: Forrest Croce
Location: Seattle, WA
|
Quote:
Originally Posted by Michel Samuel
The reality is most people don't have the eye to photography. But unfortunately the cheapness of digital has made it that they take sooooooo many photos that eventually there is a decent one. Still, like everything else you either know what you're doing or you don't. It's just such a pain in the backside that so many turkeys appear, sell a handful, ruin the profit margin and then disappear because they can't hack it.
|
The sad part is that there seems to be a steady stream of people wanting to get their feet wet. So when the first one doesn't like the water, someone else is ready to take his or her place. I personally think more people competing on merit is good for just about everybody, but there are a lot of photogs that don't have any talent. Interestingly, there are a lot of web consulting companies that don't know what they're doing, too; seems to be the way of the world.
|
|
|
|
06-02-2007, 04:04 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Quote:
Originally Posted by ForrestCroce
The sad part is that there seems to be a steady stream of people wanting to get their feet wet. So when the first one doesn't like the water, someone else is ready to take his or her place. I personally think more people competing on merit is good for just about everybody, but there are a lot of photogs that don't have any talent. Interestingly, there are a lot of web consulting companies that don't know what they're doing, too; seems to be the way of the world.
|
On one hand it is great because it opens up the door for sooo many people that wouldn't otherwise get the opportunity. But on the other hand it is sad because real talent is rare and very hard to find. So we have people that "think" they have talent and playing in industries (not just film/photo industries) and consumer/client forces tend not to push the lone independant out business with any semblance of speed.
So they end up being around just long enough to make things interesting for the rest of us. And that is where all this dialogue and conversation went to.
To sum it up...
1) The world has changed and competition is fierce.
2) Intellectual property laws are sometimes ambiguous and impossible to police.
3) The old fashion "money-politiics" game hasn't vanished and continues to play a huge impact on how many things are being legally interpreted.
4) We are all, in one sense or another, struggling to find ways to both adapt to the new enviroment and sometimes force the enviroment to behave like what we are used to.
5) Alexa is evil.  (sorry had to add that in)
|
|
|
|
06-15-2007, 11:36 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Update.
What happen was just how I expected things to occur.
Someone in Amazon.coms legal department read the complaint, called my attorney and decided I was more trouble than I am worth.
I speculate he made at least a phone call to Alexa. In the end the site I was talking about was removed from the alexa database. And I got faxed a tech list of what I need to block in order to stay out of it from now on.
In the end...
No big deal.
|
|
|
|
06-15-2007, 11:11 PM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 8,663
Name: Steven Bradley
Location: Boulder, Colorado
|
Seems like a natural way to end things with Alexa. Of course one day you may want back in and then what?
|
|
|
|
06-16-2007, 01:17 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Quote:
Originally Posted by vangogh
Seems like a natural way to end things with Alexa. Of course one day you may want back in and then what?
|
Personally I don't see the value in being in their database or on their site.
But if I did want to get back into their database I just have to unblock their toolbar users, MSN Toolbar users, Amazon.com refer traffic, pingdom's bot and their IP address and several other refer sites, etc.
All and all I figure about 3 minutes of tech work and they will automatically start recording data again.
|
|
|
|
07-01-2007, 03:52 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
UPDATE :
I as just litterally AtTaCkEd by 31 Alexa bots !!
And the opening page of my site suddenly appears in the wayback machine.
It's a page for 2 years ago and I don't even have a copy of it myself. But here it is listed as being March 10th 2007... Amazing !
Let's just recap this for a moment everything I have done to make Alexa go away !!
1. Robots.txt Excluded them directly.
2. Geoip block on the entire world save for France and Quebec.
3. .htaccess file denies access to the entire 208.70 IP address range
4. I use the ilicit server command for my 403 page and send a 404 error with all my 403 errors
5. I go legal and the Amazon legal department has agreed to remove my site from their databases. They fax me from their technical department the entire set up on who I have to block to keep me out.
But today 31 bots attacked!
They literally kept comming until 1 bot somehow got through and crawled my site.
You know it is a wack idea...
But I am almost starting to believe in a gouverment conspiracy.
Anyway here is the ip addresses of the bots.
208.70.28.43
208.70.28.48
208.70.28.49
208.70.28.52
208.70.28.60
208.70.28.61
208.70.28.70
208.70.28.73
208.70.29.90
208.70.29.92
208.70.29.115
208.70.29.131
208.70.29.134
208.70.29.136
208.70.29.140
208.70.29.172
208.70.29.177
208.70.29.180
208.70.29.181
208.70.29.182
208.70.29.190
208.70.29.197
208.70.29.199
208.70.29.200
208.70.29.202
208.70.29.204
208.70.29.208
208.70.31.0
|
|
|
|
07-01-2007, 02:20 PM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 3,024
Name: Forrest Croce
Location: Seattle, WA
|
Here's an unusual question ... but do you have any idea how they got through?
|
|
|
|
07-01-2007, 03:48 PM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Quote:
Originally Posted by ForrestCroce
Here's an unusual question ... but do you have any idea how they got through?
|
Absolutly no idea.
All I know is they managed to crawl the entire site from a banned IP address. Including a password protected section.
The amazing thing was that the bots were in succession. Both in IP addresses and timestamps. Literally it looks like Alexa just let loose their server and kept going until something got through.
This leaves me to speculate several things.
1. That Alexa has a vast array of bots at thier disposal and I don't belive they are all carbon copies of each other. Obviously some are "tougher" than others.
2. It is possible to ignore a robots.txt file... An .htaccess file... A php geiop script... and probably anything else one man can concieve.
3. This is not an isolated incident and probably happens more often than we believe.
4. Alexa must have an another motive for their actions. No search engine or Internet archive is sooooo important that a company would go to the trouble of making a bot that can hack a site unless it had to.
|
|
|
|
07-01-2007, 09:39 PM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 8,663
Name: Steven Bradley
Location: Boulder, Colorado
|
I find it hard to believe that Alexa is trying to hack their way into your site. I suppose anything is possible, but a more likely scenario is you have something set up wrong on your end. Are you absolutely positive you've blocked ia_archiver correctly? Even if you think you have check again. Maybe you have, but definitely exhaust that possibility before assuming Alexa is trying to hack your site.
robots.txt doesn't prevent anything. All it's doing is asking bots to crawl or not crawl your site. Most registered bots will comply, but there's nothing about robots.txt that prevents them from not complying.
To block ia_archiver you should have
User-agent: ia_archiver
Disallow: /
in your robots.txt file and it should come before the generic
User-agent: *
since bots will observe the first set of rules they think might apply to them.
Strange about the .htaccess file blocking the range of IPs since that should work. Again make sure you set things up correctly. I know how easy it easy to make a mistake where you think you've done things right, but haven't.
From what I understand of your call to the legal dept, Alexa agreed to remove your site from their database. That's not the same thing as never attempting to crawl again. They may have assumed you set things up to block them. It's not like someone at Alexa is there and says I think we should crawl this site today. It's all automated and they're following links like every other bot. Also if someone went to Alexa and typed in your URL asking for it to be crawled I'm sure the bot would attempt to crawl your site.
It's really hard to believe though that you have everything set up correctly to keep Alexa from crawling your site and someone on their end is attempting to rewrite the bot to attack your site specifically. It's hard to believe there's any kind of conspiracy going on. I think you might be seeing a conspiracy because you want to see one.
Remember too that you're not exactly in the majority in thinking being crawled by Alexa is a bad thing. Most people want Alexa to crawl their site.
|
|
|
|
07-02-2007, 12:24 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 3,024
Name: Forrest Croce
Location: Seattle, WA
|
Quote:
Originally Posted by Michel Samuel
1. That Alexa has a vast array of bots at thier disposal and I don't belive they are all carbon copies of each other. Obviously some are "tougher" than others.
|
That's not an unreasonable thing to suspect. I wrote a bot as a learning experience; it taught me a lot about how html is parsed, which helps me code my site more efficiently. It's not a difficult thing to do. I don't actually run my bot, but if I'm able to create one...
Quote:
Originally Posted by Michel Samuel
2. It is possible to ignore a robots.txt file... An .htaccess file... A php geiop script... and probably anything else one man can concieve.
|
It takes more work to make a robot obey robots.txt than to ignore it completely. The other stuff, though, isn't a matter of them ignoring your .htaccess file. That runs on your server before it decides what to do with their requests. They ask for pages almost blindly, and the geo-ip and .htaccess decide what to send down from the server. The only way around that would be to use a proxy server based in France to cloak their IP address, but if you see one you recognize as theirs in your logs, that can't be the case.
|
|
|
|
07-02-2007, 01:14 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Quote:
Originally Posted by vangogh
I find it hard to believe that Alexa is trying to hack their way into your site. I suppose anything is possible, but a more likely scenario is you have something set up wrong on your end. Are you absolutely positive you've blocked ia_archiver correctly? Even if you think you have check again. Maybe you have, but definitely exhaust that possibility before assuming Alexa is trying to hack your site.
robots.txt doesn't prevent anything. All it's doing is asking bots to crawl or not crawl your site. Most registered bots will comply, but there's nothing about robots.txt that prevents them from not complying.
To block ia_archiver you should have
User-agent: ia_archiver
Disallow: /
in your robots.txt file and it should come before the generic
User-agent: *
since bots will observe the first set of rules they think might apply to them.
Strange about the .htaccess file blocking the range of IPs since that should work. Again make sure you set things up correctly. I know how easy it easy to make a mistake where you think you've done things right, but haven't.
From what I understand of your call to the legal dept, Alexa agreed to remove your site from their database. That's not the same thing as never attempting to crawl again. They may have assumed you set things up to block them. It's not like someone at Alexa is there and says I think we should crawl this site today. It's all automated and they're following links like every other bot. Also if someone went to Alexa and typed in your URL asking for it to be crawled I'm sure the bot would attempt to crawl your site.
It's really hard to believe though that you have everything set up correctly to keep Alexa from crawling your site and someone on their end is attempting to rewrite the bot to attack your site specifically. It's hard to believe there's any kind of conspiracy going on. I think you might be seeing a conspiracy because you want to see one.
Remember too that you're not exactly in the majority in thinking being crawled by Alexa is a bad thing. Most people want Alexa to crawl their site.
|
This is a mystery now because I'm set up correctly. I have gone over everything with a fine tooth comb and had professional look at my stuff and my hosting company.
All are in agreeance. I'm good on my end but something else is going on. During the exact time of the bots attacking my site the host records an abnormal amount of bandwidth usage.
Don't forget it took 31 bots to finally crawl my site.
If my stuff wasn't set up tightly the first one would have gotten through and that would have been that.
As for the legal department....
No I got it in writting that they will stay the H*ll off my property. So I'm going to be making another call today to let them know I'm not amused.
As for the conspiracy thingy...
Sometimes it's darn hard not to be suspicious of one. I agree it is probably nothing at all. But you know we aren't exactly living in the era of "play fair people." So as I believe there is NOT a conspiracy, it wouldn't surprise me if someone in a gouverment office just wanted to make sure I wasn't trying to hide a terrorist site.
In fact nothing surprises me these days.
Quote:
Originally Posted by ForrestCroce
That's not an unreasonable thing to suspect. I wrote a bot as a learning experience; it taught me a lot about how html is parsed, which helps me code my site more efficiently. It's not a difficult thing to do. I don't actually run my bot, but if I'm able to create one...
It takes more work to make a robot obey robots.txt than to ignore it completely. The other stuff, though, isn't a matter of them ignoring your .htaccess file. That runs on your server before it decides what to do with their requests. They ask for pages almost blindly, and the geo-ip and .htaccess decide what to send down from the server. The only way around that would be to use a proxy server based in France to cloak their IP address, but if you see one you recognize as theirs in your logs, that can't be the case.
|
Logically you are right.
But the fact remains that Alexa bot 208.70.31.0
which is recording as being from.....
The Presidio of San Francisco
Address: 116 Sheridan Ave.
City: San Francisco
StateProv: CA
PostalCode: 94129
Country: US
Was succesful in crawling my site when it's 30 predecessors could not.
|
|
|
|
07-03-2007, 03:03 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 8,663
Name: Steven Bradley
Location: Boulder, Colorado
|
I suppose it could also be someone else spoofing those IPs. If you have things set up right then you have them set up right, but I can't really think of any reason Alexa would try to hack into your site. That just doesn't make sense.
|
|
|
|
07-04-2007, 12:57 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Quote:
Originally Posted by vangogh
I suppose it could also be someone else spoofing those IPs. If you have things set up right then you have them set up right, but I can't really think of any reason Alexa would try to hack into your site. That just doesn't make sense.
|
(update)
The plot thickens....
1. The bots appear to be attacking regularly and several times per day.
2. The hosting company has now compiled a list of several other sites with similar problem.
3. The pages to my site from the succesful crawl is now appearing on archive.
The good news they appear to be completely shut out this time.
|
|
|
|
07-04-2007, 01:42 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 8,663
Name: Steven Bradley
Location: Boulder, Colorado
|
Very strange. Are you going to contact Alexa again?
|
|
|
|
07-04-2007, 08:18 AM
|
Re: My .htaccess file doesn't want to work anymore.
|
Posts: 253
Name: Michel Samuel
|
Quote:
Originally Posted by vangogh
Very strange. Are you going to contact Alexa again?
|
I faxed a very nasty letter to the Amazon legal department and I'm waiting to hear from them.
The hosting company has made some attempts at contact but Alexa hasn't responded yet. (Still kind of early for a response)
|
|
|
| |