利用Compass实现一个简单的搜索引擎[转贴]

来源:百度文库 编辑:神马文学网 时间:2024/04/27 16:16:17
Compass是一流的开放源码JAVA搜索引擎框架,对于你的应用修饰,搜索引擎语义更具有能力。依靠顶级的Lucene搜索引擎,Compass 结合了,像 Hibernate和 Sprin的流行的框架,为你的应用提供了从数据模型和数据源同步改变的搜索力.并且添加了2方面的特征,事物管理和快速更新优化. Compass的目标是:把java应用简单集成到搜索引擎中.编码更少,查找数据更便捷。
下面以一个应用场景分步骤讲解如何利用compass实现搜索引擎:
1. 这里我们有一个Article表,希望利用compass实现对它的搜索。
Article的结构如下:
CREATE TABLE `article` (
`ArticleID` bigint(20) NOT NULL,
`PersonInfoID` bigint(20) default NULL,
`ArticleTitle` varchar(200) default NULL,
`PublishDate` datetime default NULL,
`Summary` text,
`Content` longtext,
`KeyList` text,
PRIMARY KEY (`ArticleID`),
KEY `PersonInfoArticle_FK` (`PersonInfoID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
我们希望利用compass对它的ArticleTitle、Summary、Content和KeyList进行全文检索。下面开始行动吧。
2. 首先到http://www.opensymphony.com/compass/download.action 上下载一个compass的发布版,我们下载的是Version 1.0.0的With Dependencies 。这样就可能省去寻找相关信赖库的麻烦了。
3. 将compass1.0解压到一个合适的目录,我们的工作目录是d:\develop\compass1.0
4. 我们是在eclipse环境下实现当前要求的,所以建议你也安装一个eclipse 3.2。
5. 首先我们在eclipse中建立了一个java工程,名为mycompass。
6. 然后我们在工程目录中建立了一个lib目录,用来存放本次工程所需要的所有compass和其它相关的库文件,并将他们设置为当前工程构建路径中需要的库文件。所有这些文件可以在compass的安装目录的lib目录找到。
下面是我们的库文件列表:
7. 建立了Article表的pojo类。
CODE:
package com.darkhe.sample.mycompass;
// Generated 2006-8-2 10:57:06 by Hibernate Tools 3.2.0.beta6a
import java.util.Date;
/**
* Article generated by hbm2java
*/
public class Article implements java.io.Serializable {
// Fields
private long articleId;
private Long personInfoId;
private String articleTitle;
private Date publishDate;
private String summary;
private String content;
private String keyList;
// Constructors
/** default constructor */
public Article() {
}
/** minimal constructor */
public Article(long articleId) {
this.articleId = articleId;
}
/** full constructor */
public Article(long articleId, Long personInfoId, String articleTitle,
Date publishDate, String summary, String content, String keyList) {
this.articleId = articleId;
this.personInfoId = personInfoId;
this.articleTitle = articleTitle;
this.publishDate = publishDate;
this.summary = summary;
this.content = content;
this.keyList = keyList;
}
// Property accessors
public long getArticleId() {
return this.articleId;
}
public void setArticleId(long articleId) {
this.articleId = articleId;
}
public Long getPersonInfoId() {
return this.personInfoId;
}
public void setPersonInfoId(Long personInfoId) {
this.personInfoId = personInfoId;
}
public String getArticleTitle() {
return this.articleTitle;
}
public void setArticleTitle(String articleTitle) {
this.articleTitle = articleTitle;
}
public Date getPublishDate() {
return this.publishDate;
}
public void setPublishDate(Date publishDate) {
this.publishDate = publishDate;
}
public String getSummary() {
return this.summary;
}
public void setSummary(String summary) {
this.summary = summary;
}
public String getContent() {
return this.content;
}
public void setContent(String content) {
this.content = content;
}
public String getKeyList() {
return this.keyList;
}
public void setKeyList(String keyList) {
this.keyList = keyList;
}
}
[Copy to clipboard]
8. 建立hibernate的pojo到数据表映射文件
CODE:

"[url]http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd[/url]">








































[Copy to clipboard]
9. 开始配置compass,首先是compass的系统配置文件 mycompass.cfg.xml
CODE:

xmlns="[url]http://www.opensymphony.com/compass/schema/core-config[/url]"
xmlns:xsi="[url]http://www.w3.org/2001/XMLSchema-instance[/url]"
xsi:schemaLocation="[url]http://www.opensymphony.com/compass/schema/core-config[/url]
[url]http://www.opensymphony.com/compass/schema/compass-core-config.xsd[/url]">













[Copy to clipboard]
在上面的配置中,我们使用的我们选用的一个中文分词算法库,你可以用compass自带的。具体compass提供了哪些分词算法,请查阅compass的手册。
10. 然后是mycompass.cmd.xml
CODE:

"-//Compass/Compass Core Meta Data DTD 1.0//EN"
"[url]http://www.opensymphony.com/compass/dtd/compass-core-meta-data.dtd[/url]">



Mycompass Meta Data
[url]http://com/darkhe/sample/mycompass[/url]


Article alias
[url]http://com/darkhe/sample/mycompass/alias/Article[/url]
Article



ArticleTitle
[url]http://com/darkhe/sample/mycompass/alias/ArticleTitle[/url]
ArticleTitle


PublishDate
[url]http://com/darkhe/sample/mycompass/alias/PublishDate[/url]
date


Summary
[url]http://com/darkhe/sample/mycompass/alias/Summary[/url]
Summary


Content
[url]http://com/darkhe/sample/mycompass/alias/Content[/url]
Content


KeyList
[url]http://com/darkhe/sample/mycompass/alias/KeyList[/url]
KeyList



[Copy to clipboard]
11. 再是mycompass.cpm.xml
CODE:

"-//Compass/Compass Core Mapping DTD 1.0//EN"
"[url]http://www.opensymphony.com/compass/dtd/compass-core-mapping.dtd[/url]">







${mycompass.ArticleTitle}


${mycompass.PublishDate}


${mycompass.Summary}


${mycompass.Content}


${mycompass.KeyList}



[Copy to clipboard]
12. log4j.properties
CODE:
log4j.rootLogger=WARN, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p %c - %m%n
log4j.logger.org.compass=INFO
[Copy to clipboard]
13. jdbc.properties
CODE:
# Properties file with JDBC-related settings.
# Applied by PropertyPlaceholderConfigurer from "applicationContext-*.xml".
# Targeted at system administrators, to avoid touching the context XML files.
jdbc.driverClassName=com.mysql.jdbc.Driver
#jdbc.driverClassName=org.hsqldb.jdbcDriver
#jdbc.url=jdbc:hsqldb:hsql://localhost:9001
jdbc.url=jdbc:mysql://localhost:3306/testdb
jdbc.username=test
jdbc.password=test
# Property that determines the Hibernate dialect
# (only applied with "applicationContext-hibernate.xml")
#hibernate.dialect=org.hibernate.dialect.HSQLDialect
hibernate.dialect=org.hibernate.dialect.MySQLDialect
[Copy to clipboard]
14. 最后是applicationContext-hibernate.xml,这里集中配置了compass如何与spring与hibernate结合的。
CODE:







class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

classpath:jdbc.properties






class="org.springframework.jdbc.datasource.DriverManagerDataSource">

${jdbc.driverClassName}


${jdbc.url}


${jdbc.username}


${jdbc.password}





class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">






com/darkhe/sample/mycompass/Article.hbm.xml






${hibernate.dialect}

false
true





class="org.springframework.orm.hibernate3.support.IdTransferringMergeEventListener" />








classpath:mycompass.cmd.xml
classpath:mycompass.cpm.xml



classpath:mycompass.cfg.xml






class="org.compass.spring.device.hibernate.SpringHibernate3GpsDevice">

hibernateDevice





init-method="start" destroy-method="stop">





class="org.compass.spring.device.SpringSyncTransactionGpsDeviceWrapper">







[Copy to clipboard]
15. 注意上面的所以配置文件,根据我们上面的配置,都应当放到classpath的根路径。
16. 建立工具类,用来进行spring引擎的初始化工作。
CODE:
/**
*

@(#) IOC.java 2006-2-1 0:08:23


*

Copyright (c) 2005-2006 ???????????????????


*/
package com.darkhe.sample.mycompass;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
/**
*
*
* @version 1.0 2006-2-1
* @author darkhe
*/
public class IOC {
private static ApplicationContext context = null;
private static boolean isInit = false;
private IOC() {
super();
}
private static void init() {
if (isInit == false) {
String[] xmlfilenames = { "applicationContext-hibernate.xml" };
context = new ClassPathXmlApplicationContext(xmlfilenames);
isInit = true;
}
}
/**
*
* @return
*/
public static ApplicationContext getContext() {
if (context == null || isInit == false) {
init();
}
return context;
}
/**
*
* @param name
* @return
*/
public static Object getBean(String name) {
return getContext().getBean(name);
}
}
[Copy to clipboard]
17. 建立索引程序,用来数据库中的建立索引
CODE:
/*
* Copyright (c) 2005-2006
* ChongQing Man-Month Technology Development Co. ,Ltd
*
* ---------------------------------------------------------------------------------
* @(#) Inder.java, 2006-8-1 下午09:01:14
* ---------------------------------------------------------------------------------
*/
package com.darkhe.sample.mycompass;
import java.io.FileNotFoundException;
import org.compass.gps.CompassGps;
import org.springframework.context.ApplicationContext;
/**
* @author darkhe
*
*/
public class Indexer {
/**
* @param args
* @throws FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException {
// 加裁自定义词典
DictionaryUtils.loadCustomDictionary();
ApplicationContext context = IOC.getContext();
// 得到spring环境中已经配置和初始化好的compassGps对象
CompassGps compassGps = (CompassGps) context.getBean("compassGps");
// 调用index方法建立索引
compassGps.index();
}
}
[Copy to clipboard]
18. 建立搜索程序,检证compass的应用。
CODE:
/*
* Copyright (c) 2005-2006
* ChongQing Man-Month Technology Development Co. ,Ltd
*
* ---------------------------------------------------------------------------------
* @(#) Searcher.java, 2006-8-1 下午09:36:29
* ---------------------------------------------------------------------------------
*/
package com.darkhe.sample.mycompass;
import java.io.FileNotFoundException;
import org.compass.core.Compass;
import org.compass.core.CompassCallbackWithoutResult;
import org.compass.core.CompassException;
import org.compass.core.CompassHits;
import org.compass.core.CompassSession;
import org.compass.core.CompassTemplate;
import org.compass.core.Resource;
import org.springframework.context.ApplicationContext;
/**
* @author darkhe
*
*/
public class Searcher {
/**
* @param args
* @throws FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException {
// 加裁自定义词典
DictionaryUtils.loadCustomDictionary();
ApplicationContext context = IOC.getContext();
Compass compass = (Compass) context.getBean("compass");
CompassTemplate template = new CompassTemplate(compass);
template.execute(new CompassCallbackWithoutResult() {
protected void doInCompassWithoutResult(CompassSession session)
throws CompassException {
CompassHits hits = session.find("大头人");
System.out.println("Found [" + hits.getLength()
+ "] hits for [大头人] query");
System.out
.println("======================================================");
for (int i = 0; i < hits.getLength(); i++) {
print(hits, i);
}
hits.close();
}
});
}
public static void print(CompassHits hits, int hitNumber) {
Object value = hits.data(hitNumber);
Resource resource = hits.resource(hitNumber);
System.out.println("ALIAS [" + resource.getAlias() + "] SCORE ["
+ hits.score(hitNumber) + "]");
System.out.println(":::: " + value);
System.out.println("");
}
}
[Copy to clipboard]
19. 工具类DictionaryUtils是用来管理我们自己采用的中文分词算法的加载自定义词典的。
CODE:
/**
* Copyright (c) 2005-2006 重庆人月科技发展有限公司
*
* ------------------------------------------------------------------------------
* @(#) DictionaryUtils.java, 2006-8-2 下午04:55:22
* ------------------------------------------------------------------------------
*/
package com.darkhe.sample.mycompass;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import jeasy.analysis.MMAnalyzer;
/**
*
* @author darkhe
* @version 1.0.0
*/
public class DictionaryUtils {
// 静态变量
private static boolean isInit = false;
// 静态初始化
// 静态方法
public static void loadCustomDictionary() throws FileNotFoundException {
if (isInit == false) {
// 添加我们自己的词典
FileReader fr = new FileReader(new File("dict.txt"));
MMAnalyzer.addDictionary(fr);
//System.out.println("添加我们自己的词典");
isInit = true;
}
}
}
[Copy to clipboard]
20. 执行Indexer,再执行Seracher后控制台信息如下:
Found [1] hits for [大头人] query
================================================
ALIAS [Article] SCORE [0.3988277]
::::com.darkhe.sample.mycompass.Article@bla4e2
具体结果和你的数据表中的内容有别。
21. 这样,我们便实现了如何利用compass构建我们自己的搜索引擎的一个简单实现。