Reply
Any lessons from the AOL data?
Old 08-11-2006, 04:26 AM
imported_Koz's Avatar
Junior Talker

Posts: 118
What I find interesting is the keywords used to find particular sites. Have a look at this URL, it makes use of the AOL blunder :thumbup:

http://www.askthebrain.com/aol/
imported_Koz is offline
Reply With Quote
View Public Profile Visit imported_Koz's homepage!
 
Old 08-11-2006, 11:32 AM
Junior Talker

Posts: 4
Quote:
Originally Posted by august View Post
my queries seem to be taking 1 to 3 seconds max, in over 15 million entries
Can you please give some info on how you managed to index this dataset?

What sort of queries are you running that are flying along so quickly?

Are you running COUNT queries?

Thanks in advance for any help

Rgds

RC
Red Cardinal is offline
Reply With Quote
View Public Profile
 
Old 08-11-2006, 04:11 PM
Bookworm-SEO's Avatar
SEO Champ

Posts: 440
I forget where it was, (seochat maybe), but I read a case study, where number one dropped for a few days to number two. The went from 30,000 hits/day to 12,000.
The aol data is pretty interesting. I'd love to see a web interface to mine it .
Bookworm-SEO is offline
Reply With Quote
View Public Profile Visit Bookworm-SEO's homepage!
 
Old 08-11-2006, 05:11 PM
Extreme Talker

Posts: 240
Queries on about 3.5million rows for me are taking about 45-60 seconds!

But then I just shunted it into MS Access without any indexes Maybe I'll import it into the MySQL server I've got running locally and see if its any better
imported_mattd is offline
Reply With Quote
View Public Profile
 
Old 08-11-2006, 09:45 PM
Super Talker

Posts: 117
How are you guys loading these files?

I'm just trying to load 1/10 size pieces and I'm having trouble. (on a 900MB RAM machine)
Notepad and Wordpad can't handle them.
OpenOffice takes about 10 minutes and then locks up when the progress bar is full. (I'd have to use Base, since Calc only loads the first 65K lines and there are ~2M in just one of the pieces)
DOS Type and Find work, but I was looking for something a little more powerful.
gopher292 is offline
Reply With Quote
View Public Profile
 
Old 08-11-2006, 10:03 PM
Junior Talker

Posts: 1
Anyone care to post data beyond the first 10 results? I'd be curious to see how #11 compares, seeing as how it's at the top of the second page.

Does it beat out spots #9 and #10?
NuiLoa is offline
Reply With Quote
View Public Profile
 
Old 08-12-2006, 04:47 AM
$100 - $999 Monthly

Posts: 328
Quote:
Originally Posted by gopher292 View Post
How are you guys loading these files?

I'm just trying to load 1/10 size pieces and I'm having trouble. (on a 900MB RAM machine)
Notepad and Wordpad can't handle them.
OpenOffice takes about 10 minutes and then locks up when the progress bar is full. (I'd have to use Base, since Calc only loads the first 65K lines and there are ~2M in just one of the pieces)
DOS Type and Find work, but I was looking for something a little more powerful.

whaaa. what would you do with the data when you have it in an editor? you can't get any figures out of that, you really nead to load it into a database.
I tried Access 2000 first, but that's really slow and I could only import 4 out of 10 files.
Now I have all 10 in MySQL 5 and that works quite ok..
SEO Portal is offline
Reply With Quote
View Public Profile Visit SEO Portal's homepage!
 
Old 08-12-2006, 04:48 AM
$100 - $999 Monthly

Posts: 328
Quote:
Originally Posted by NuiLoa View Post
Anyone care to post data beyond the first 10 results? I'd be curious to see how #11 compares, seeing as how it's at the top of the second page.

Does it beat out spots #9 and #10?
No, it doesn't by a long shot People just don't click through to 2nd page..

here, this is still out of my sample I used previously;

Code:
1	3275637	42,25%
2	925507	11,94%
3	656290	8,47%
4	468703	6,05%
5	377877	4,87%
6	309669	3,99%
7	262140	3,38%
8	230981	2,98%
9	229909	2,97%
10	218748	2,82%
11	50615	0,65%
12	43030	0,56%
13	40246	0,52%
14	37455	0,48%
15	36177	0,47%
16	29800	0,38%
17	27769	0,36%
18	26025	0,34%
19	25738	0,33%
20	24573	0,32%
21	22868	0,29%
22	21961	0,28%
23	21738	0,28%
24	21018	0,27%
25	20935	0,27%
SEO Portal is offline
Reply With Quote
View Public Profile Visit SEO Portal's homepage!
 
Old 08-12-2006, 07:13 AM
Junior Talker

Posts: 1
Quote:
Originally Posted by NuiLoa View Post
Anyone care to post data beyond the first 10 results? I'd be curious to see how #11 compares, seeing as how it's at the top of the second page.

Does it beat out spots #9 and #10?
#11 does not beat out spots #9 and #10, turns out page 1 is the almighty god of search results. However, what you suspected can be observed with later pages. I don't know if that's useful though because all other pages after the first and second get almost no hits in comparison to the first page.

I have posted some figures on my web page at u500k.erinye.com including clicks by rank for all 500 ranks in the data set. There are other interesting observations to be made, too, if you're willing to look at more than what's SEO relevant.
DannyTheFool is offline
Reply With Quote
View Public Profile
 
Old 08-12-2006, 09:19 AM
Slam's Avatar
Junior Talker

Posts: 23
http://seoblackhat.com/2006/08/11/to...-msn/#comments
Slam is offline
Reply With Quote
View Public Profile
 
Old 08-12-2006, 10:42 AM
tkroll's Avatar
Junior Talker

Posts: 27
I'd like to see the top click getters along with their Alexa, PR, backlink count from Google/Yahoo, and total pages indexed from Google/Yahoo.

I know that's a lot to ask!

With proper indexing and temp tables, these quries should not exceed a second in execution. Get rid of the "www." from the URLs in the db.

Ty
__________________
FREE » Award-winning eBay tools for Google Desktop and Windows Vista Sidebar
tkroll is offline
Reply With Quote
View Public Profile
 
Old 08-12-2006, 12:41 PM
Junior Talker

Posts: 4
Code:
Rank #	Clickthroughs	%	Delta #n-1	Delta #1
	19434540	100%		
				
1	8220278	42.30%	n/a	n/a
2	2316738	11.92%	-71.82%	-71.82%
3	1640751	8.44%	-29.18%	-80.04%
4	1171642	6.03%	-28.59%	-85.75%
5	943667	4.86%	-19.46%	-88.52%
6	774718	3.99%	-17.90%	-90.58%
7	655914	3.37%	-15.34%	-92.02%
8	579206	2.98%	-11.69%	-92.95%
9	549196	2.83%	-5.18%	-93.32%
10	577325	2.97%	5.12%	-92.98%
				
11	127688	0.66%	-77.88%	-98.45%
12	108555	0.56%	-14.98%	-98.68%
13	101802	0.52%	-6.22%	-98.76%
14	94221	0.48%	-7.45%	-98.85%
15	91020	0.47%	-3.40%	-98.89%
16	75006	0.39%	-17.59%	-99.09%
17	70054	0.36%	-6.60%	-99.15%
18	65832	0.34%	-6.03%	-99.20%
19	62141	0.32%	-5.61%	-99.24%
20	58384	0.30%	-6.05%	-99.29%
				
21	55471	0.29%	-4.99%	-99.33%
				
31	23041	0.12%	-58.46%	-99.72%
				
41	14024	0.07%	-39.13%	-99.83%
Hope that displays properly.

I have some charts and the above table over on redcardinal.ie/seo/12-08-2006/clickthrough-analysis-of-aol-datatgz/. (cant post urls because I'm too new )

For anyone trying to load the files:
1. crete table without indices:

create database aol;
use aol;
create table searches (
id int not null primary key auto_increment,
anonid int,
query varchar(200),
querytime timestamp,
itemrank int,
clickurl varchar(200);
2. use the mysql load command:
LOAD DATA INFILE '/path/AOL-user-ct-collection/user-ct-test-collection-[file number].txt' INTO TABLE searches IGNORE 1 LINES (anonid, query, querytime, itemrank, clickurl);

I created a small php script and the data loaded in a couple of minutes.

Indexing will take hours on the full dataset (36m rows). Indexing itemrank doesnt take too long, query and clickurl gonna take a while though.
Red Cardinal is offline
Reply With Quote
View Public Profile
 
Old 08-12-2006, 12:44 PM
Junior Talker

Posts: 4
Quote:
Originally Posted by tkroll View Post
I'd like to see the top click getters along with their Alexa, PR, backlink count from Google/Yahoo, and total pages indexed from Google/Yahoo.

With proper indexing and temp tables, these quries should not exceed a second in execution.

Ty
Hi

Just noticed your post above mine. Can you give us some sql to achieve this pls?

Its taken me a couple of days just to get to what I've posted below and I too would like to be able to interogate the data a bit more than I have to date.

Thanks
Red Cardinal is offline
Reply With Quote
View Public Profile
 
Old 08-13-2006, 03:56 AM
Hawaii SEO's Avatar
Junior Talker

Posts: 64
Does anyone have a version of this in bite sized chunks that can be opened with excel?

Thanks,
Dave.
__________________
Hawaii SEO
Hawaii SEO is offline
Reply With Quote
View Public Profile Visit Hawaii SEO's homepage!
 
Old 08-13-2006, 04:58 AM
Junior Talker

Posts: 12
Quote:
Originally Posted by gopher292 View Post
How are you guys loading these files?

I'm just trying to load 1/10 size pieces and I'm having trouble. (on a 900MB RAM machine)
Notepad and Wordpad can't handle them.
OpenOffice takes about 10 minutes and then locks up when the progress bar is full. (I'd have to use Base, since Calc only loads the first 65K lines and there are ~2M in just one of the pieces)
DOS Type and Find work, but I was looking for something a little more powerful.
I kept them divided into 10 files, then used a MySQL management software to convert from the tab files into SQL INSERT INTO statements. Then I zipped it up, put it up on the server, unzipped it there, and importated into MySQL.

Quote:
Originally Posted by mattd View Post
Queries on about 3.5million rows for me are taking about 45-60 seconds!

But then I just shunted it into MS Access without any indexes Maybe I'll import it into the MySQL server I've got running locally and see if its any better
I made indexes on all of the fields, to make sure it's running quick, the max it've taken is about 20 seconds for most complex queries.
__________________
1 Hour PHP <- Hampster Program PHP In One Hour Or Less Or They Die

Money Tip Of The Day
august is offline
Reply With Quote
View Public Profile
 
Old 08-13-2006, 04:58 AM
Junior Talker

Posts: 12
Quote:
Originally Posted by Hawaii SEO View Post
Does anyone have a version of this in bite sized chunks that can be opened with excel?

Thanks,
Dave.
How small are you talking? what would you be able to deduce with such small data?
__________________
1 Hour PHP <- Hampster Program PHP In One Hour Or Less Or They Die

Money Tip Of The Day
august is offline
Reply With Quote
View Public Profile
 
Old 08-13-2006, 11:16 AM
Junior Talker

Posts: 27
www.aolsearchdatabase.com
KevinJB is offline
Reply With Quote
View Public Profile
 
Old 08-13-2006, 01:19 PM
Vi5
Vi5's Avatar
evening earner

Posts: 245
Quote:
Originally Posted by KevinJB View Post
Thanks but is there anywhere I can get say 1m for excel or access?

Alos those URL searches - remember AOL will be the error page for AOL users and maybe it's like the MSN error page.
Vi5 is offline
Reply With Quote
View Public Profile Visit Vi5's homepage!
 
Old 08-13-2006, 01:23 PM
Junior Talker

Posts: 1
portal, what query did you use to filter out the first 7,752,953 clicks? And did you mean clicks or records? You do realize that not all searches resulted in clicks, right?

-Michael
mvandemar is offline
Reply With Quote
View Public Profile
 
Old 08-13-2006, 03:35 PM
$100 - $999 Monthly

Posts: 328
Quote:
Originally Posted by mvandemar View Post
portal, what query did you use to filter out the first 7,752,953 clicks? And did you mean clicks or records? You do realize that not all searches resulted in clicks, right?

-Michael
I didn't use no query;

I simply only loaded the first 4 txt files into the database. Since I tried it in access first the dbase reached it's maximum size after 7,752,953 clicks.. (14,4 milion queries or so).
So yes, I know not every search resulted in click.. my sample was 7,752,953 clicks and over 14 milion searches (records)

Right now Im running some queries in MySQL on the whole dataset...
SEO Portal is offline
Reply With Quote
View Public Profile Visit SEO Portal's homepage!
 
Reply     « Reply to Any lessons from the AOL data?

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Webmaster Resources Marketplace:
Software Development Company |