Reply
How do I program a bot?
Old 04-28-2008, 05:17 PM How do I program a bot?
Webmaster Talker

Posts: 522
I need to program a bot. I need it to login to an account I have that is located on a password protected site. From there, I need to scrape the HTML that is generated on the screen (it is from an ASP page). I will then parse the HTML that has been scraped and pull out the information that I need which I will then save in my database for later use.

Can someone please help me with this?

For the record, this is all legal as this is an account that I'm allowed to be in but I just want to automate the process at it will be very time consuming to do it manually.

Any help is appreciated.
zincoxide is offline
Reply With Quote
View Public Profile
 
When You Register, These Ads Go Away!
     
Old 04-28-2008, 09:43 PM Re: How do I program a bot?
addonchat's Avatar
Skilled Talker

Posts: 97
Name: Chris Duerr
file_get_contents() will allow you to easily fetch a web page.
http://us2.php.net/manual/en/functio...t-contents.php

As for logging in, if it is a form, the easiest way would be to see if it processes CGI GET, in that case you'd just call file_get_contents() and append the url-encoded username/password parameters. E.g., "/login.php?un=bob&password=12345"

I'm not sure if file_get_contents() supports the standard "http://USERNAME:PASSWORD...." protocol, but it's worth a try.

Enjoy your parsing
__________________
Chris Duerr
AddonChat Java Chat Software
http://www.addonchat.com/ - Affiliate Program
addonchat is offline
Reply With Quote
View Public Profile
 
Old 04-29-2008, 03:16 AM Re: How do I program a bot?
solomongaby's Avatar
Webmaster Talker

Posts: 518
Name: Gabe Solomon
Location: Romania
you can try to use the class snoppy ... you can find it on sourceforge ... it has the ability to do a login even with POST variables
__________________
If you like my posts ... TK is appreciated:)
Web Directory | Blog
solomongaby is offline
Reply With Quote
View Public Profile Visit solomongaby's homepage!
 
Old 04-29-2008, 09:19 AM Re: How do I program a bot?
addonchat's Avatar
Skilled Talker

Posts: 97
Name: Chris Duerr
For POST events, if you prefer to do it yourself, check out:

http://us2.php.net/manual/en/functio...ost-fields.php

and look over:

http://us2.php.net/http
__________________
Chris Duerr
AddonChat Java Chat Software
http://www.addonchat.com/ - Affiliate Program
addonchat is offline
Reply With Quote
View Public Profile
 
Old 04-30-2008, 09:55 AM Re: How do I program a bot?
Webmaster Talker

Posts: 522
How do hackers and stuff make bots that will login to sites then. Is there no way to automate the process, or would I have to program something in C that is basically a browser, parses the page and then inputs the info?

There has got to be a way to do this.

From what you guys are saying, it sounds like I can get into the site but then how do I set it up that it will click on the links that I need to get to the page to parse?
zincoxide is offline
Reply With Quote
View Public Profile
 
Old 04-30-2008, 10:09 AM Re: How do I program a bot?
tripy's Avatar
Fetchez la vache!

Posts: 1,819
Name: Thierry
Location: In the void
You need to parse every answers you get after every requests.
Or you know the link to be followed, and you blindly go there, discarding what the server sends you back.

As for the scrapping, there are different approaches, some better than others, but I won't explain them.
This look a bit too border line for my taste.

What !?
You thought there was a "hacking construction kit" and you would have just to make 3 clicks ?
Sorry, but no. It takes real programming to do so.
__________________
Listen to the ducky: "This is awesome!!!"

tripy is offline
Reply With Quote
View Public Profile
 
Old 04-30-2008, 10:29 AM Re: How do I program a bot?
addonchat's Avatar
Skilled Talker

Posts: 97
Name: Chris Duerr
zincoxide -- Programs don't click, people click. Short of parsing the HTML, I've already posted all of the information anyone would need. I suggest you start with the HTTP specification - http://www.w3.org/Protocols/rfc2616/rfc2616.html

Enjoy!
__________________
Chris Duerr
AddonChat Java Chat Software
http://www.addonchat.com/ - Affiliate Program
addonchat is offline
Reply With Quote
View Public Profile
 
Old 04-30-2008, 11:48 AM Re: How do I program a bot?
Ultra Talker

Posts: 408
Do you need this done once, regularly, or continuously? This seems pretty simple to do.
Lucas3677 is offline
Reply With Quote
View Public Profile Visit Lucas3677's homepage!
 
Old 05-01-2008, 11:09 AM Re: How do I program a bot?
Webmaster Talker

Posts: 522
I understand that people do the 'clicking'. I guess I'm using the wrong words.

Let me try explaining what I'm doing:

I am trying to login to a site (I would like it done daily) that lists the companies that I deal with and the rates that they offer.

The problem is that the site uses sessions (I think) to maintain a logged in status, so if I copy the url for the rates page I get a logged out message and it asks me to login. So..

I need to find a way to log in using a bot. Once you log in it automatically goes to a main menu. From there, you can find a link in the nav menu that I click on to get the rates. Then a page comes up with the rates in a table. This is the page that I want to parse.

So... I need a method of the site saving my session information so I can subsequently get the appropriate page. But I need it to save the session info and I don't know how to do that. Maybe it isn't a php thing, it might be a different language I need.
zincoxide is offline
Reply With Quote
View Public Profile
 
Old 05-01-2008, 01:51 PM Re: How do I program a bot?
VirtuosiMedia's Avatar
Webmaster Talker

Latest Blog Post:
Is SEO Testing Worthless?
Posts: 566
This is probably a no, but does the site you're trying to get info from have an RSS feed that you could subscribe to instead of you having to login through a bot?
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Old 05-01-2008, 07:02 PM Re: How do I program a bot?
addonchat's Avatar
Skilled Talker

Posts: 97
Name: Chris Duerr
If the site is programmed correctly, there is no need for you to worry about cookies. The login sounds like it is a form, view the source for that form and find out what variable names are used for login/password, the form submit type (post or get) then use the references above to automate the login. After that, your program should be able to directly retrieve the link you're interested in, then you can begin parsing it.

The only item you'll likely have to contend with is reading redirect output of the
login page to capture the session ID, which should be passed along without cookies being used.
__________________
Chris Duerr
AddonChat Java Chat Software
http://www.addonchat.com/ - Affiliate Program
addonchat is offline
Reply With Quote
View Public Profile
 
Old 05-02-2008, 09:55 AM Re: How do I program a bot?
Webmaster Talker

Posts: 522
Okay... I get it now. Thank you.

Unfortunately, they don't have an RSS feed. I've been trying for two years for them to get one but everytime I ask they say "R-what". It's very frustrating. This company claims to be ahead of the competition when it comes to technology but they don't know what the heck they are doing.

Yesterday I actually found a page (well file), which I can wget without needing to login. It has all the information I need in it.

Now I just have to figure out how to parse HTML accurately.

Thanks for all your help!
zincoxide is offline
Reply With Quote
View Public Profile
 
Old 05-02-2008, 10:10 AM Re: How do I program a bot?
nickohrn's Avatar
Weightlifting CS Student

Posts: 505
Name: Nick Ohrn
For parsing HTML, you have a couple of options, zincoxide. The easiest one to use is just XML document parsing. You can use PHP5's DomDocument class to parse a web page as long as you know the page is going to be valid XHTML. If you have questions about the validity of the code you're going to be retrieving, you'll have to go another route.

For screen scraping, regular expressions are a really popular technique. This article is a great starting point for that. Good luck with your venture.
__________________
Plugin-Developer.com - Custom plugin development to fit your needs. Plugins available for WordPress and Drupal, among others.
nickohrn is offline
Reply With Quote
View Public Profile Visit nickohrn's homepage!
 
Reply     « Reply to How do I program a bot?
 

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML

 


Page generated in 0.18754 seconds with 13 queries