Reply
Old 06-14-2009, 06:50 AM Robots
Average Talker

Posts: 28
Trades: 0
Hello,
I have my main domain at public_html/ then I have other subfolders, some of them are a part of the domain and some of them are not, but in future those subfolders will be the root of new domains.

For example public_html/forums is a good folder, part of the www.mydomain.com

But the folder public_html/mynewdomain is not yet a domain, I need access to it by accessing the www.mydomain.com/mynewdomain url, but only temporarly, since I don't have yet a domain for that subfolder.

I would like search engines to NOT access at all to the subfolder public_html/mynewdomain, because I do not want this new site be in google as www.mydomain.com/mynewdomain, instead it should be www.mynewdomain.com in the future.

How can I block search engines to enter this folder? Is it something that I can do easily? Blocking the access to this folder from the main domain will create some problems in the indexing of the site even when I will have a new domain?

Thank you a lot
pikus is offline
Reply With Quote
View Public Profile
 
 
When You Register, These Ads Go Away!
Old 06-14-2009, 09:05 PM Re: Robots
Skilled Talker

Posts: 57
Name: Larry K
Trades: 0
in public_html/robots.txt:

Code:
User-agent: *
Disallow: /mynewdomain/
Disallow: /cgi-bin/
Disallow: /contact-
Disallow: /css/
This snippet prevents search engines from accessing the named folders, including the /mynewdomain/ folder, and anything that begins with "contact."

Place another robots.txt file in the mynewdomain folder to direct search engines on where not to go within your new domain once it goes live.

Google search robots.txt for more.
L a r r y is offline
Reply With Quote
View Public Profile Visit L a r r y's homepage!
 
Old 06-14-2009, 10:57 PM Re: Robots
webcosmo's Avatar
Ultra Talker

Posts: 267
Location: Boston, MA
Trades: 1
You can also check the referrer, if its a bot redirect to a page.
__________________
JOIN US * FAST GROWING WEBMASTER FORUM * JOIN US
Join the fun loving, helpful community of webmasters
Free Classified Ads
webcosmo is offline
Reply With Quote
View Public Profile Visit webcosmo's homepage!
 
Old 06-15-2009, 04:33 AM Re: Robots
Average Talker

Posts: 28
Trades: 0
Quote:
Originally Posted by L a r r y View Post
in public_html/robots.txt:
Place another robots.txt file in the mynewdomain folder to direct search engines on where not to go within your new domain once it goes live.

Google search robots.txt for more.
What should I write in this robots.txt?

Quote:
Originally Posted by webcosmo View Post
You can also check the referrer, if its a bot redirect to a page.
How do I do that?
pikus is offline
Reply With Quote
View Public Profile
 
Old 06-15-2009, 08:32 PM Re: Robots
Skilled Talker

Posts: 57
Name: Larry K
Trades: 0
Quote:
Originally Posted by pikus View Post
What should I write in this robots.txt?
Code:
User-agent: *
Disallow: /newdomainfolder/
The first line in your robots.txt specifies the name of the user-agent, in this case, an asterisk, which specifies all robots, because you want the instructions to apply to all of them. Therefore, your first line must duplicate mine.
Second and following lines begin with the case-sensitive, "Disallow" and a colon.
Disallow is the only instruction you can place in the Robots.txt.
It will not allow access to whatever you specify.

You need to specify the name of the folder that houses the root of your new domain, substituting the name of your folder in the place of mine.

Add one Disallow: per line for each additional file or folder you don't want to allow to be included in Web search results. For me, that includes /cgi-bin/, /css/ or /stylesheets/, and other pages that I don't want to have appearing in the world's search engines.
L a r r y is offline
Reply With Quote
View Public Profile Visit L a r r y's homepage!
 
Old 06-16-2009, 05:00 AM Re: Robots
Average Talker

Posts: 28
Trades: 0
Yes yes, thank you.
I understood how should be the robots.txt in the root folder.

But you said that I have to put another one on the same folder I want to exclude. And this is the one that I don't know what to write in and why I have to make another robots on the excluded folder, since search engines are already blocked by the robots.txt on the main folder?
pikus is offline
Reply With Quote
View Public Profile
 
Old 06-21-2009, 03:03 PM Re: Robots
Skilled Talker

Posts: 57
Name: Larry K
Trades: 0
Quote:
Originally Posted by pikus View Post
Yes yes, thank you.
I understood how should be the robots.txt in the root folder.

But you said that I have to put another one on the same folder I want to exclude. And this is the one that I don't know what to write in and why I have to make another robots on the excluded folder, since search engines are already blocked by the robots.txt on the main folder?
The newdomain folder's robots.txt is not needed at this time. . It will be at the web root of your new domain when the time comes for it to be activated. Sorry for this point of confusion!
L a r r y is offline
Reply With Quote
View Public Profile Visit L a r r y's homepage!
 
Old 06-21-2009, 04:12 PM Re: Robots
Average Talker

Posts: 28
Trades: 0
thank you a lot! You gave me a good help!
pikus is offline
Reply With Quote
View Public Profile
 
Old 06-28-2009, 05:42 AM Re: Robots
Novice Talker

Posts: 7
Name: jacob
Trades: 0
can anyone tell me how to set the robots.txt on server?so the google can index all the content of my website?please help..thx before

and how if i want to block the google from indexing my expired webpage with the robots.txt...what is the command should i write in the robots.txt file,for example if my expired webpage is >www.myweb.com/1_10_Lenovo-ideapad-S10.html. so what should i type in the robots file?
jacobsz is offline
Reply With Quote
View Public Profile
 
Old 07-06-2009, 03:41 AM Re: Robots
Skilled Talker

Posts: 57
Name: Larry K
Trades: 0
Quote:
Originally Posted by jacobsz View Post
can anyone tell me how to set the robots.txt on server?so the google can index all the content of my website?please help..thx before

and how if i want to block the google from indexing my expired webpage with the robots.txt...what is the command should i write in the robots.txt file,for example if my expired webpage is >www.myweb.com/1_10_Lenovo-ideapad-S10.html. so what should i type in the robots file?
Robots.txt that allows full access to everything on the server is:

User-agent: *
Disallow:

The User-agent line applies to all robots, and an empty Disallow: will allow everything, and the presence of a robots.txt will eliminate the 404 errors from your error log when the expected robots.txt is not found. Having no robots.txt will have the same effect but with the error log reporting the missing file.

At least one Disallow: entry must be in the file, and this example meets the minimum requirement.

See: http://www.robotstxt.org/orig.html
L a r r y is offline
Reply With Quote
View Public Profile Visit L a r r y's homepage!
 
Old 07-06-2009, 09:18 AM Re: Robots
Experienced Talker

Posts: 30
Name: lisa
Trades: 0
You need to specify the name of the folder that houses the root of your new domain, substituting the name of your folder in the place of mine.
__________________
[B][SIZE="3"][URL="
grass321 is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to Robots
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML

 



Page generated in 0.15075 seconds with 13 queries