Google 网上论坛 : macromedia.coldfusion.cfml_general_discussion

来源:百度文库 编辑:神马文学网 时间:2024/04/28 14:37:23
Verity Indexing and Searching
关于此主题的全部 3 个帖子 -树式浏览
mkane1  6月3日 上午5时42分    显示选项
新闻论坛:macromedia.coldfusion.cfml_general_discussion
发件人: "mkane1" -查找此作者的帖子
日期:Thu, 2 Jun 2005 21:42:03 +0000 (UTC)
当地时间:2005年6月3日(星期五) 上午5时42分
主题:Verity Indexing and Searching
答复作者 |转发 |打印 |显示个别帖子 |显示原始邮件 |报告滥用行为
I have used a process for searching Cold Fusion powered sites for years. It‘s
quite simple, really: I create a set of Verity collections pertinent to the
site, schedule an indexing template as appropriate, and use cfsearch. Most of
the sites I‘ve dealt with so far have not wanted much "full text" indexing, so
most of the indexes are based on query data, and only some on small sets of
documents. I have not used the K2 server, aside from some experiments when it
came out. But now I need to provide a combination of searches, and one will be
against some 5 gigabytes of PDF files.
I have not been able to successfully cfindex this collection, using the
cfindex tag or the Cold Fusion administrator. After an hour or two, it times
out. I can probably get an initial index built, but am concerned about future
updates. When the content providers add documents to this repository, the index
will of course need to be updated. I have had performance problems with Verity
collections in the past, which I resolved by having the indexing template
actually delete the collection and re-create it every time. No problem with
small indexes, but it‘s gonna be a problem with this new app.
So I‘ve been looking into alternatives. The server I am developing the app on
is running CFMX 6.1 on Windows 2000 Server. The production server is the same,
but both will be upgraded to CFMX7 in a few weeks or so.
The problem right now is building an index, not in searching, so using the K2
server seems irrelevant to that point. Is that incorrect?
Could the Verity spider offer any help? I find the documentation on the Verity
spider to be quite frustrating. I can‘t find an example of a command to run it
that seems pertinent to my needs: I don‘t want to spider a web site, I want to
index documents. In the "Attach Code" box, I am excerpting from the livedocs.
Any suggestions? Thanks!
--- Excerpt from Livedocs --
At its most basic level, a Verity Spider command consists of the following:
vspider -initialize -collection coll [options]
Where -initialize is -start or -refresh (when starting points have changed),
and -collection is required to provide a target for the Verity Spider, and
[options] can be a near-limitless combination of the options described later in
this chapter.
For example:
c:\cfusionmx\lib\_nti40\bin\vspider -common c:\cfusionmx\lib\common
-collection c:\new -starthttp://localhost -indinclude *
--- End of livedocs, my comments below. ---
Assume I don‘t need to path the command itself, and want to index all the
documents in e:\somedocs\docs and the collection is named Core_Docs
What would my command be? I do not understand what "c:\new"  and
"http://localhost" mean in the sample command.
mpwoodward *TMM*  6月3日 上午8时26分    显示选项
新闻论坛:macromedia.coldfusion.cfml_general_discussion
发件人: mpwoodward *TMM* -查找此作者的帖子
日期:Thu, 2 Jun 2005 19:26:31 -0500
当地时间:2005年6月3日(星期五) 上午8时26分
主题:Re: Verity Indexing and Searching
答复作者 |转发 |打印 |显示个别帖子 |显示原始邮件 |报告滥用行为
On 2005-06-02 16:42:03 -0500, "mkane1" said:
> I have used a process for searching Cold Fusion powered sites for
> years. It‘s quite simple, really: I create a set of Verity collections
> pertinent to the site, schedule an indexing template as appropriate,
> and use cfsearch. Most of the sites I‘ve dealt with so far have not
> wanted much "full text" indexing, so most of the indexes are based on
> query data, and only some on small sets of documents. I have not used
> the K2 server, aside from some experiments when it came out. But now I
> need to provide a combination of searches, and one will be against some
> 5 gigabytes of PDF files.
What‘s the total number of PDF files you‘re indexing?  The Verity that
ships with CF does have limits (125K documents for standard, 250K for
enterprise).
>  I have not been able to successfully cfindex this collection, using
> the cfindex tag or the Cold Fusion administrator. After an hour or two,
> it times out. I can probably get an initial index built, but am
> concerned about future updates. When the content providers add
> documents to this repository, the index will of course need to be
> updated. I have had performance problems with Verity collections in the
> past, which I resolved by having the indexing template actually delete
> the collection and re-create it every time. No problem with small
> indexes, but it‘s gonna be a problem with this new app.
Would it be an option to split these out into multiple Verity collections?
>  So I‘ve been looking into alternatives. The server I am developing the
> app on is running CFMX 6.1 on Windows 2000 Server. The production
> server is the same, but both will be upgraded to CFMX7 in a few weeks
> or so.
Just so you know, CFMX 7 is Verity K2 by default (and is solely K2
based on my understanding).  There is no longer a distinction between
the two Verity technologies.
>  The problem right now is building an index, not in searching, so using
> the K2 server seems irrelevant to that point. Is that incorrect?
>  Could the Verity spider offer any help?
Not really--you use Verity to index documents like Office documents and
PDFs, while the spider is used to spider web pages (e.g. HTML and CFML
pages) as they would be rendered by the browser.
>  I find the documentation on the Verity spider to be quite frustrating.
> I can‘t find an example of a command to run it that seems pertinent to
> my needs: I don‘t want to spider a web site, I want to index documents.
Which is a job for Verity, not the spider.
Bottom line is it‘s going to take a huge amount of time to index that
volume of documents so you‘re going to have to plan accordingly.
First, check to see what the total number of documents is and see if
you‘re at the upper limits of CF‘s Verity.  Second, split the documents
up into numerous smaller, more manageable collections which will help
things.
Hope that gives you some ideas.
Matt
--
Matt Woodward
mpwoodw...@gmail.com
Team Macromedia - ColdFusion
_xyz