darren hobbs: distributed lucene

来源：百度文库编辑：神马文学网时间：2024/04/24 20:12:45

Interesting article by Mark Harwoodhere regarding distributed lucene indexes. Using distributed indexes is how google achieves its scalability I believe, but they are a fairly special case. If scalability in the sense of concurrent users is the issue, I tend to favour multiple identical boxes with a load balancer and an RPC frontend. This can be as simple as a servlet, or you can use SOAP or XML-RPC etc. (Possibly RMI, although I‘ve never tried that across a load balancer). Doing things this way is probably a lot simpler to manage than splitting your indexes across boxes and means that even if your queries are asymmetric (ie. 85% of the queries are for the same thing), the load can be fairly balanced. Reliability is achieved for free as well - if a box dies just stop sending requests there. Given Lucene‘s performance (it has been used to index collections of more than 10 million documents) its pretty unlikely that your dataset will get so large that sheer size starts to affect your query times. Unless of course, you are google :)

darren hobbs: distributed lucene re: notes on distributed searching with lucene priority_queue用法 - Darren - C++博客 readings in distributed system Distributed Systems Reading List 分词 Lucene Lucene Tutorial Dissect Lucene - Lucene中的文档 Distributed Systems - Google Code University ... CodeProject: Memcached (Distributed Cache) AS... memcached: a distributed memory object cachin... Amazon's Dynamo - All Things Distributed Distributed System Stress and Load Testing Did You Mean: Lucene? Using Lucene with OJB Lucene中的基本概念 Lucene的学习 Experienced In Lucene Hadoop、Lucene、Nutch Highlighter 2.0　加亮 Lucene lucene快速入门 Lucene 中文分词学习Lucene的资料学习Lucene的资料