[Nutch-dev] MD5 in fetchlist / fetcher

来源:百度文库 编辑:神马文学网 时间:2024/04/28 13:08:49
Michael Ji
Fri, 19 Aug 2005 20:09:27 -0700
hi there,I dumped the contents in segment/fetchlist andsegment/fetcher;My curious question is that: why MD5 signature of thepage content doesn‘t save in fetchlist? In my mind, I think it will save CPU time if we see apage unchanged --- coz we can skip the parsingprocess; From my view, if we have MD5 in fetchlist, wecan do it directly in memory. If we have MD5 infetcher, we need to search it in local file in orderto do compare with the new fetched page content MD5.Did I miss some important points or my dumping iswrong?thanks,Michael Ji ----------------fetchlist--------------------fetch: truepage: Version: 4URL:http://www.sina.com/ID: d6a83e9c17e05d5602709a63c241bf68Next fetch: Sun Aug 21 20:15:06 CDT 2005Retries since fetch: 0Retry interval: 30 daysNum outlinks: 0Score: 1.0NextScore: 1.0anchors: 0----------------fetcher--------------------fetch: truepage: Version: 4URL:http://www.sina.com/ID: d6a83e9c17e05d5602709a63c241bf68Next fetch: Sun Aug 21 20:15:06 CDT 2005Retries since fetch: 0Retry interval: 30 daysNum outlinks: 0Score: 1.0NextScore: 1.0anchors: 0Fetch Result:MD5Hash: 56eae3c2556cb10a00e7346738dcb318ProtocolStatus: success(1), lastModified=0FetchDate: Sun Aug 14 20:15:13 CDT 2005__________________________________________________Do You Yahoo!?Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com -------------------------------------------------------SF.Net email is Sponsored by the Better Software Conference & EXPOSeptember 19-22, 2005 * San Francisco, CA * Development Lifecycle PracticesAgile & Plan-Driven Development * Managing Projects & Teams * Testing & QASecurity * Process Improvement & Measurement *http://www.sqe.com/bsce5sf_______________________________________________Nutch-developers mailing listNutch-developers@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/nutch-developers