For the 'find a bot' part, the W3C actually offers a link checker that you can download and play with. It's written in Perl though, but I'm sure it's a good start if you need some ideas to do it in PHP.
As for validating the pages, the W3C offers an experimental API for use with their validator. However, it does mention (and rightfully so) that excessive usage will cause your site to be banned, so if you plan on making this a public service, you should host a copy of the validator on your own server. You can download the validator from here.
__________________
The interlocking pieces of web development: usability, performance, accessibility, and standards.
|