Today's Final Jeapordy Question is:
How would you go about finding domain names / websites that DON'T exist?
For example,
www.ftt.com
www.your-domain-name-here.com
These domain names are registered, but there is nothing on them. Does anyone have ideas on how to find a list of website that do not have anything on them?
Here are my initial thoughts:
-Somehow find a global list of all domain names (!)
-download index.html / root page
-if header says "permantly moved", doesn't exist
-if no index.html / root page,query whoisdatabase to see if it is registered (tests original list), if not registered, abort move on to next name
-if 404 recieved or timed out or no index.html / root page, then doesn't exist
-if root exists, scan for "under construction", or "coming soon" (make list of things people put on pages that are under construction)
-if root exists, test if page is smaller than 1000 bytes (javascript redirects?)
Those are my thoughts. Unfortunately, I don't know how to get a list of websites in the first place. I can't crawl Google because...those websites dont have anything on them and nobody will link to them!
Naturally I won't get ALL domain names because there are 85billion of them, but even if there was a way I could grab 1000 at a time or something. Like all that match
www.aaa*.com then aab*.com etc down the line.
My goal is to find unused domain names, or domain names that have not really anything on them.
Any ideas?
John at mccarthy dot net