Posts: 2,535
Location: Western Maryland
|
Quote:
|
Originally Posted by demoIXI
I tried that and it does not work. Do you have any other ideas?
|
Well, does it work for the original URL. For example, if it can successfully pull the contents from one url (e.g., http://www.cnn.com), then it should be equally effective crawling the links harvested from that page.
You know what's probably happening? You are probably harvesting relative URLs. I'm sure of it.
For example, when you point your program at http://www.somewhere.com and you harvest links off that page, the URLs are most likely relative URLs (e.g., href="FAQs.html", href="Contact.php", etc.). If, then, you try to submit just that value to your function, it will fail because the URL is not qualified. You need to add some logic to your code to see if the URL starts with "http://". If it doesn't, then you need to take the current URL, remove the final document element (e.g., "http://www.somewhere.com/Information/index.php" becomes "http://www.somewhere.com/Information/") and then add the relative URL back to it ("http://www.somewhere.com/Information/FAQs.html") and resubmit to your function file_get_contents().
__________________
—Kyrnt
|