Hi
I am just about to takeover responsibility for a fairly large web site which is around 3/4 complete. But I think there maybe a problem when it comes to this site being spidered by google as the site appears to use a single servlet to deal with all processing of the pages the upshot of which is that the urls for different pages are almost always the same apart from some url parameters, here's an example of three different page url's used by the system
http://localhost:8080/training/panec...6A0EC8E197DF7D E08D5D36?!INSTANCE!=1&!NODE!=1
http://localhost:8080/training/panec...6A0EC8E197DF7D E08D5D36?!INSTANCE!=2&!NODE!=2&action=list
http://localhost:8080/training/panec...6A0EC8E197DF7D E08D5D36?!INSTANCE!=2&!NODE!=2&action=addbook
I've heard that google does not like session ids as google spiders a single site with more than one spider so the reliance of sessions could cause incomplete spidering but surely if the session id is in the url this would make it easier for google wouldn't it? I've also heard that google limits the number of pages it catalogues from one particular site that have parameters in there url to prevent the spider from spidering too much dynamic code, is this true?
Also I would appreciate any input into whether the above url's are going to affect google spidering the site completely.
Regards
Tony
|