|
Dear All,
I am working on a project towards a text retrieval system. In the system, a document is represented by features (i.e., some words and phrases conveying the essence of the document). And the similarity between a query and a certain document is dertermined by counting the matching of features. Currently, at retrieval time, features of all documents are held in memory. However, holding all feature data in memory will become impossible as the number of documents dramatically increases.
I have thought of this problem for a long time. The only way I can figure out at this time is to use database technology. However, the problem that follows is that the drop-off in efficiency (retrieval response time will increase as accessing hard disk, where database is located, is slower than accessing memory ) by using database technology. I do not have any experience and idea about this. Or there may be other better technologies available?
Sometimes, I wonder how Google to solve this problem.
Please help. Thanks!
|