A quick tour of Hadoop's Reporter object

来源:百度文库 编辑:神马文学网 时间:2024/04/29 14:37:33
A quick tour of Hadoop's Reporter object
Hadoop provides theReporter object as a way for mappers and reducers to report their progress, to let the cluster know they are still alive, and to tally information for the coder. Each mapper and reducer gets a reporter object, as it is included in the parameter signature of both:
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
and
public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
The reporter variable (or whatever you chose to call it) has three methods useful to the MapReduce programmer:
progress() - This method simply phones home to the NameNode, letting it know that the mapper or reducer is still working and has not died or zombified. This is particularly useful when an individual map or reduce operation is likely to last for more than a minute or so. By default, mappers or reducers that don't check in or finish within ten minutes are killed by the cluster.
This method's usage is particularly simple:
// Start working on new data, let cluster know it'll take a while
reporter.progress();
setStatus(String status): This method lets the cluster know the mapper or reducer is alive as well as providing a message to the user. For instance, if you were coding up the canonical word-frequency program and wanted to let the user know which word your mapper was working on at any particular time, this could look like (ignoring the enormous performance hit such verbosity would create):
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
StringTokenizer st = new StringTokenizer(value.toString());
while(st.hasMoreTokens())
{
String s = st.nextToken();
reporter.setStatus("Current word: " + s);
output.collect(new Text(s), new IntWritable(1));
}
}
This results in the web interface providing reports such as:

incrCounter(Enum key, long amount) : This final method provides a way to tally the number of occurrences of whatever data the coder needs to track. For instance, if you wished to evaluate the performance of your mappers, you could divide them into three groups: those that take less than a minute to complete, those that take one to 10 minutes to complete, and those that take longer. An enum to represent this delineation could look like:
public enum MapDuration { LessThanAMinute, OneTo10Minutes, MoreThan10Minutes };
Then, along with some appropriate timing code, it's easy to determine the aggregate running time of all the maps:
if(timeInSecs < 60)
reporter.incrCounter(MapDuration.LessThanAMinute, 1);
else if (timeInSecs < 60 * 10)
reporter.incrCounter(MapDuration.OneTo10Minutes, 1);
else
reporter.incrCounter(MapDuration.MoreThan10Minutes, 1);
These counters are then tallied and reported at the end of the job:
07/11/10 23:40:46 INFO mapred.JobClient: Counters: 13
07/11/10 23:40:46 INFO mapred.JobClient: Job Counters
07/11/10 23:40:46 INFO mapred.JobClient: Launched map tasks=764
07/11/10 23:40:46 INFO mapred.JobClient: Launched reduce tasks=72
07/11/10 23:40:46 INFO mapred.JobClient: Data-local map tasks=739
07/11/10 23:40:46 INFO mapred.JobClient: Map-Reduce Framework
07/11/10 23:40:46 INFO mapred.JobClient: Map input records=8904519
07/11/10 23:40:46 INFO mapred.JobClient: Map output records=76065408
07/11/10 23:40:46 INFO mapred.JobClient: Map input bytes=447405751
07/11/10 23:40:46 INFO mapred.JobClient: Map output bytes=738042659
07/11/10 23:40:46 INFO mapred.JobClient: Reduce input groups=1969948
07/11/10 23:40:46 INFO mapred.JobClient: Reduce input records=76065408
07/11/10 23:40:46 INFO mapred.JobClient: Reduce output records=1969948
07/11/10 23:40:46 INFO mapred.JobClient: playingWithReporter.PlayingWithReporterDriver$PWRMapper$MapDuration
07/11/10 23:40:46 INFO mapred.JobClient: LessThanAMinute=44276257
07/11/10 23:40:46 INFO mapred.JobClient: OneTo10Minutes=27332173
07/11/10 23:40:46 INFO mapred.JobClient: MoreThan10Minutes=4456978