[Nutch-dev] MD5 in fetchlist / fetcher
来源:百度文库 编辑:神马文学网 时间:2024/04/28 13:08:49
Michael Ji
Fri, 19 Aug 2005 20:09:27 -0700
hi there,I dumped the contents in segment/fetchlist andsegment/fetcher;My curious question is that: why MD5 signature of thepage content doesn‘t save in fetchlist? In my mind, I think it will save CPU time if we see apage unchanged --- coz we can skip the parsingprocess; From my view, if we have MD5 in fetchlist, wecan do it directly in memory. If we have MD5 infetcher, we need to search it in local file in orderto do compare with the new fetched page content MD5.Did I miss some important points or my dumping iswrong?thanks,Michael Ji ----------------fetchlist--------------------fetch: truepage: Version: 4URL:http://www.sina.com/ID: d6a83e9c17e05d5602709a63c241bf68Next fetch: Sun Aug 21 20:15:06 CDT 2005Retries since fetch: 0Retry interval: 30 daysNum outlinks: 0Score: 1.0NextScore: 1.0anchors: 0----------------fetcher--------------------fetch: truepage: Version: 4URL:http://www.sina.com/ID: d6a83e9c17e05d5602709a63c241bf68Next fetch: Sun Aug 21 20:15:06 CDT 2005Retries since fetch: 0Retry interval: 30 daysNum outlinks: 0Score: 1.0NextScore: 1.0anchors: 0Fetch Result:MD5Hash: 56eae3c2556cb10a00e7346738dcb318ProtocolStatus: success(1), lastModified=0FetchDate: Sun Aug 14 20:15:13 CDT 2005__________________________________________________Do You Yahoo!?Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com -------------------------------------------------------SF.Net email is Sponsored by the Better Software Conference & EXPOSeptember 19-22, 2005 * San Francisco, CA * Development Lifecycle PracticesAgile & Plan-Driven Development * Managing Projects & Teams * Testing & QASecurity * Process Improvement & Measurement *http://www.sqe.com/bsce5sf_______________________________________________Nutch-developers mailing listNutch-developers@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/nutch-developers
Fri, 19 Aug 2005 20:09:27 -0700
hi there,I dumped the contents in segment/fetchlist andsegment/fetcher;My curious question is that: why MD5 signature of thepage content doesn‘t save in fetchlist? In my mind, I think it will save CPU time if we see apage unchanged --- coz we can skip the parsingprocess; From my view, if we have MD5 in fetchlist, wecan do it directly in memory. If we have MD5 infetcher, we need to search it in local file in orderto do compare with the new fetched page content MD5.Did I miss some important points or my dumping iswrong?thanks,Michael Ji ----------------fetchlist--------------------fetch: truepage: Version: 4URL:http://www.sina.com/ID: d6a83e9c17e05d5602709a63c241bf68Next fetch: Sun Aug 21 20:15:06 CDT 2005Retries since fetch: 0Retry interval: 30 daysNum outlinks: 0Score: 1.0NextScore: 1.0anchors: 0----------------fetcher--------------------fetch: truepage: Version: 4URL:http://www.sina.com/ID: d6a83e9c17e05d5602709a63c241bf68Next fetch: Sun Aug 21 20:15:06 CDT 2005Retries since fetch: 0Retry interval: 30 daysNum outlinks: 0Score: 1.0NextScore: 1.0anchors: 0Fetch Result:MD5Hash: 56eae3c2556cb10a00e7346738dcb318ProtocolStatus: success(1), lastModified=0FetchDate: Sun Aug 14 20:15:13 CDT 2005__________________________________________________Do You Yahoo!?Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com -------------------------------------------------------SF.Net email is Sponsored by the Better Software Conference & EXPOSeptember 19-22, 2005 * San Francisco, CA * Development Lifecycle PracticesAgile & Plan-Driven Development * Managing Projects & Teams * Testing & QASecurity * Process Improvement & Measurement *http://www.sqe.com/bsce5sf_______________________________________________Nutch-developers mailing listNutch-developers@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/nutch-developers
[Nutch-dev] MD5 in fetchlist / fetcher
ActiveX Support In SWT-dev
MD5
Fetcher for constrained crawls
md5加密算法
MD5加密
Dev c Usage
佛教Dev小常识
Agylen Nutch Page Ranking
试用Nutch (1)
javascript md5加密
Nutch 初体验
Hadoop、Lucene、Nutch
Nutch 的配置文件
nutch内部工作流程 -
Crawl The Nutch --
MD5 加密解密
MD5校验码的使用说明
Crawl The Nutch --
dev C++中如何调试
NUTCH介绍--抓取(1)
Nutch version 0.8 安装向导
纯代码实现md5算法
Nutch搜索引擎之分布式文件系统