利用Compass实现一个简单的搜索引擎[转贴] - 刘文涛 - BlogJava

来源:百度文库 编辑:神马文学网 时间:2024/04/28 00:47:50
Compass是一流的开放源码JAVA搜索引擎框架,对于你的应用修饰,搜索引擎语义更具有能力。依靠顶级的Lucene搜索引擎,Compass 结合了,像 Hibernate和 Sprin的流行的框架,为你的应用提供了从数据模型和数据源同步改变的搜索力.并且添加了2方面的特征,事物管理和快速更新优化. Compass的目标是:把java应用简单集成到搜索引擎中.编码更少,查找数据更便捷。

  下面以一个应用场景分步骤讲解如何利用compass实现搜索引擎:

1. 这里我们有一个Article表,希望利用compass实现对它的搜索。

Article的结构如下:

CREATE TABLE `article` (
  `ArticleID` bigint(20) NOT NULL,
  `PersonInfoID` bigint(20) default NULL,
  `ArticleTitle` varchar(200) default NULL,
  `PublishDate` datetime default NULL,
  `Summary` text,
  `Content` longtext,
  `KeyList` text,
  PRIMARY KEY (`ArticleID`),
  KEY `PersonInfoArticle_FK` (`PersonInfoID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

我们希望利用compass对它的ArticleTitle、Summary、Content和KeyList进行全文检索。下面开始行动吧。

2. 首先到http://www.opensymphony.com/compass/download.action 上下载一个compass的发布版,我们下载的是Version 1.0.0的With Dependencies 。这样就可能省去寻找相关信赖库的麻烦了。

3. 将compass1.0解压到一个合适的目录,我们的工作目录是d:\develop\compass1.0

4. 我们是在eclipse环境下实现当前要求的,所以建议你也安装一个eclipse 3.2。

5. 首先我们在eclipse中建立了一个java工程,名为mycompass。

6. 然后我们在工程目录中建立了一个lib目录,用来存放本次工程所需要的所有compass和其它相关的库文件,并将他们设置为当前工程构建路径中需要的库文件。所有这些文件可以在compass的安装目录的lib目录找到。

下面是我们的库文件列表:

7. 建立了Article表的pojo类。

CODE:

package com.darkhe.sample.mycompass;

// Generated 2006-8-2 10:57:06 by Hibernate Tools 3.2.0.beta6a

import java.util.Date;

/**
  * Article generated by hbm2java
  */
public class Article implements java.io.Serializable {

  // Fields  

  private long articleId;

  private Long personInfoId;

  private String articleTitle;

  private Date publishDate;

  private String summary;

  private String content;

  private String keyList;

  // Constructors

  /** default constructor */
  public Article() {
  }

  /** minimal constructor */
  public Article(long articleId) {
  this.articleId = articleId;
  }

  /** full constructor */
  public Article(long articleId, Long personInfoId, String articleTitle,
  Date publishDate, String summary, String content, String keyList) {
  this.articleId = articleId;
  this.personInfoId = personInfoId;
  this.articleTitle = articleTitle;
  this.publishDate = publishDate;
  this.summary = summary;
  this.content = content;
  this.keyList = keyList;
  }

  // Property accessors
  public long getArticleId() {
  return this.articleId;
  }

  public void setArticleId(long articleId) {
  this.articleId = articleId;
  }

  public Long getPersonInfoId() {
  return this.personInfoId;
  }

  public void setPersonInfoId(Long personInfoId) {
  this.personInfoId = personInfoId;
  }

  public String getArticleTitle() {
  return this.articleTitle;
  }

  public void setArticleTitle(String articleTitle) {
  this.articleTitle = articleTitle;
  }

  public Date getPublishDate() {
  return this.publishDate;
  }

  public void setPublishDate(Date publishDate) {
  this.publishDate = publishDate;
  }

  public String getSummary() {
  return this.summary;
  }

  public void setSummary(String summary) {
  this.summary = summary;
  }

  public String getContent() {
  return this.content;
  }

  public void setContent(String content) {
  this.content = content;
  }

  public String getKeyList() {
  return this.keyList;
  }

  public void setKeyList(String keyList) {
  this.keyList = keyList;
  }

}


[Copy to clipboard]

8. 建立hibernate的pojo到数据表映射文件

CODE:


"[url]http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd[/url]">


 
   
   
       
       
   

   
       
         
       

   

   
       
         
       

   

   
       
         
       

   

   
       
         
       

   

   
       
         
       

   

   
       
         
       

   

 


[Copy to clipboard]

9. 开始配置compass,首先是compass的系统配置文件 mycompass.cfg.xml

CODE:


xmlns="[url]http://www.opensymphony.com/compass/schema/core-config[/url]"
xmlns:xsi="[url]http://www.w3.org/2001/XMLSchema-instance[/url]"
xsi:schemaLocation="[url]http://www.opensymphony.com/compass/schema/core-config[/url]
      [url]http://www.opensymphony.com/compass/schema/compass-core-config.xsd[/url]">


 


 
     
       
           
       

     

   


[Copy to clipboard]

在上面的配置中,我们使用的我们选用的一个中文分词算法库,你可以用compass自带的。具体compass提供了哪些分词算法,请查阅compass的手册。


10. 然后是mycompass.cmd.xml

CODE:


  "-//Compass/Compass Core Meta Data DTD 1.0//EN"
  "[url]http://www.opensymphony.com/compass/dtd/compass-core-meta-data.dtd[/url]">


 
 
    Mycompass Meta Data    
    [url]http://com/darkhe/sample/mycompass[/url]
 
   
    
   
        Article alias
        [url]http://com/darkhe/sample/mycompass/alias/Article[/url]
        Article
   

   
    
   
        ArticleTitle
        [url]http://com/darkhe/sample/mycompass/alias/ArticleTitle[/url]
        ArticleTitle
   

   
   
        PublishDate
        [url]http://com/darkhe/sample/mycompass/alias/PublishDate[/url]
        date
   

   
   
        Summary
        [url]http://com/darkhe/sample/mycompass/alias/Summary[/url]
        Summary
   

   
   
        Content
        [url]http://com/darkhe/sample/mycompass/alias/Content[/url]
        Content
   

   
        KeyList
        [url]http://com/darkhe/sample/mycompass/alias/KeyList[/url]
        KeyList
   
    
               
 
 

[Copy to clipboard]


11. 再是mycompass.cpm.xml

CODE:


  "-//Compass/Compass Core Mapping DTD 1.0//EN"
  "[url]http://www.opensymphony.com/compass/dtd/compass-core-mapping.dtd[/url]">




  ${mycompass.ArticleTitle}


  ${mycompass.PublishDate}


  ${mycompass.Summary}


  ${mycompass.Content}


  ${mycompass.KeyList}



[Copy to clipboard]

12. log4j.properties

CODE:

log4j.rootLogger=WARN, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p %c - %m%n
log4j.logger.org.compass=INFO

[Copy to clipboard]


13. jdbc.properties

CODE:

# Properties file with JDBC-related settings.
# Applied by PropertyPlaceholderConfigurer from "applicationContext-*.xml".
# Targeted at system administrators, to avoid touching the context XML files.
jdbc.driverClassName=com.mysql.jdbc.Driver
#jdbc.driverClassName=org.hsqldb.jdbcDriver
#jdbc.url=jdbc:hsqldb:hsql://localhost:9001
jdbc.url=jdbc:mysql://localhost:3306/testdb
jdbc.username=test
jdbc.password=test
# Property that determines the Hibernate dialect
# (only applied with "applicationContext-hibernate.xml")
#hibernate.dialect=org.hibernate.dialect.HSQLDialect
hibernate.dialect=org.hibernate.dialect.MySQLDialect

[Copy to clipboard]


14. 最后是applicationContext-hibernate.xml,这里集中配置了compass如何与spring与hibernate结合的。

CODE:







class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

  classpath:jdbc.properties






class="org.springframework.jdbc.datasource.DriverManagerDataSource">

  ${jdbc.driverClassName}


  ${jdbc.url}


  ${jdbc.username}


  ${jdbc.password}





class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">

 


 
 
  com/darkhe/sample/mycompass/Article.hbm.xml
 

 



 
 
  ${hibernate.dialect}
 

  false
  true
 



 
 
      class="org.springframework.orm.hibernate3.support.IdTransferringMergeEventListener" />
 

 






 
  classpath:mycompass.cmd.xml
  classpath:mycompass.cpm.xml
 



  classpath:mycompass.cfg.xml


 


class="org.compass.spring.device.hibernate.SpringHibernate3GpsDevice">

  hibernateDevice


 


init-method="start" destroy-method="stop">

 


 
    class="org.compass.spring.device.SpringSyncTransactionGpsDeviceWrapper">
 
 

 



[Copy to clipboard]

15. 注意上面的所以配置文件,根据我们上面的配置,都应当放到classpath的根路径。
16. 建立工具类,用来进行spring引擎的初始化工作。

CODE:

/**
*

@(#) IOC.java 2006-2-1 0:08:23


*

Copyright (c) 2005-2006 ???????????????????


*/
package com.darkhe.sample.mycompass;

import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

/**
*
*
* @version 1.0 2006-2-1
* @author darkhe
*/
public class IOC {
private static ApplicationContext context = null;

private static boolean isInit = false;

private IOC() {
super();
}

private static void init() {

if (isInit == false) {
  String[] xmlfilenames = { "applicationContext-hibernate.xml" };

  context = new ClassPathXmlApplicationContext(xmlfilenames);

  isInit = true;
}
}

/**
*
* @return
*/
public static ApplicationContext getContext() {
if (context == null || isInit == false) {
  init();
}
return context;
}

/**
*
* @param name
* @return
*/
public static Object getBean(String name) {
return getContext().getBean(name);
}

}


[Copy to clipboard]

17. 建立索引程序,用来数据库中的建立索引

CODE:

/*
* Copyright (c) 2005-2006
* ChongQing Man-Month Technology Development Co. ,Ltd
*
* ---------------------------------------------------------------------------------
* @(#) Inder.java, 2006-8-1 下午09:01:14
* ---------------------------------------------------------------------------------
*/
package com.darkhe.sample.mycompass;

import java.io.FileNotFoundException;

import org.compass.gps.CompassGps;
import org.springframework.context.ApplicationContext;

/**
* @author darkhe
*
*/
public class Indexer {

/**
* @param args
* @throws FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException {

  // 加裁自定义词典
DictionaryUtils.loadCustomDictionary();

ApplicationContext context = IOC.getContext();

// 得到spring环境中已经配置和初始化好的compassGps对象
CompassGps compassGps = (CompassGps) context.getBean("compassGps");
// 调用index方法建立索引
compassGps.index();

}

}

[Copy to clipboard]


18. 建立搜索程序,检证compass的应用。

CODE:

/*
* Copyright (c) 2005-2006
* ChongQing Man-Month Technology Development Co. ,Ltd
*
* ---------------------------------------------------------------------------------
* @(#) Searcher.java, 2006-8-1 下午09:36:29
* ---------------------------------------------------------------------------------
*/

package com.darkhe.sample.mycompass;

import java.io.FileNotFoundException;

import org.compass.core.Compass;
import org.compass.core.CompassCallbackWithoutResult;
import org.compass.core.CompassException;
import org.compass.core.CompassHits;
import org.compass.core.CompassSession;
import org.compass.core.CompassTemplate;
import org.compass.core.Resource;
import org.springframework.context.ApplicationContext;

/**
* @author darkhe
*
*/
public class Searcher {

/**
* @param args
* @throws FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException {

// 加裁自定义词典
DictionaryUtils.loadCustomDictionary();

ApplicationContext context = IOC.getContext();

Compass compass = (Compass) context.getBean("compass");

CompassTemplate template = new CompassTemplate(compass);

template.execute(new CompassCallbackWithoutResult() {
  protected void doInCompassWithoutResult(CompassSession session)
  throws CompassException {
  CompassHits hits = session.find("大头人");

  System.out.println("Found [" + hits.getLength()
    + "] hits for [大头人] query");
  System.out
    .println("======================================================");
  for (int i = 0; i < hits.getLength(); i++) {
  print(hits, i);
  }

  hits.close();
  }
});

}

public static void print(CompassHits hits, int hitNumber) {
Object value = hits.data(hitNumber);
Resource resource = hits.resource(hitNumber);
System.out.println("ALIAS [" + resource.getAlias() + "] SCORE ["
  + hits.score(hitNumber) + "]");
System.out.println(":::: " + value);
System.out.println("");
}
}


[Copy to clipboard]


19. 工具类DictionaryUtils是用来管理我们自己采用的中文分词算法的加载自定义词典的。

CODE:

/**
* Copyright (c) 2005-2006 重庆人月科技发展有限公司
*
* ------------------------------------------------------------------------------
* @(#) DictionaryUtils.java, 2006-8-2 下午04:55:22
* ------------------------------------------------------------------------------
*/
package com.darkhe.sample.mycompass;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;

import jeasy.analysis.MMAnalyzer;

/**
*
* @author darkhe
* @version 1.0.0
*/
public class DictionaryUtils {
// 静态变量
private static boolean isInit = false;

// 静态初始化

// 静态方法
public static void loadCustomDictionary() throws FileNotFoundException {

if (isInit == false) {

  // 添加我们自己的词典
  FileReader fr = new FileReader(new File("dict.txt"));
  MMAnalyzer.addDictionary(fr);
 
  //System.out.println("添加我们自己的词典");

  isInit = true;
}
}
}

[Copy to clipboard]

20. 执行Indexer,再执行Seracher后控制台信息如下:

Found [1] hits for [大头人] query
================================================
ALIAS [Article] SCORE [0.3988277]
:::: com.darkhe.sample.mycompass.Article@bla4e2


具体结果和你的数据表中的内容有别。

21. 这样,我们便实现了如何利用compass构建我们自己的搜索引擎的一个简单实现。