MapReduce_wordcount

2024年1月22日 93次阅读来源: MapReduce

测试数据：

[hadoop@h201 mapreduce]$ more counttext.txt
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama

vim WordCount2.java

 1 package MapReduce;
 2 
 3 import java.io.*;
 4 import org.apache.hadoop.conf.Configuration;
 5 import org.apache.hadoop.fs.Path;
 6 import org.apache.hadoop.io.IntWritable;
 7 import org.apache.hadoop.io.Text;
 8 import org.apache.hadoop.mapreduce.Job;
 9 import org.apache.hadoop.mapreduce.Mapper;
10 import org.apache.hadoop.mapreduce.Reducer;
11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
13 
14 public class WordCount2{
　　　　private static final String INPUT_PATH = "hdfs://h201:9000/user/hadoop/counttext.txt";
    　　private static final String OUTPUT_PATH = "hdfs://h201:9000/user/hadoop/output";
15     public static class WordCount2Mapper extends Mapper<Object,Text,Text,IntWritable>{
16         private final static IntWritable one = new IntWritable(1);
17         private Text word = new Text();
18 
19         public void map(Object key,Text value,Context context) throws IOException, InterruptedException {
20             String[] words = value.toString().split(" ");
21             for (String str: words){
22             word.set(str);
23             context.write(word,one);
24             }
25         }
26     }     
27 
28    public static class WordCount2Reducer extends Reducer<Text,IntWritable,Text,IntWritable> {
29        public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
30            int total=0;
31            for (IntWritable val : values){
32                total++;
33            }
34            context.write(key, new IntWritable(total));
35        }   
36    }
37     
38    public static void main (String[] args) throws Exception{
39        Configuration conf = new Configuration();
40        conf.set("mapred.jar","wc1.jar");
41        Job job = new Job(conf, "wordcount");
42        job.setJarByClass(WordCount2.class);
43        job.setMapperClass(WordCount2Mapper.class);
44        job.setReducerClass(WordCount2Reducer.class);
45        job.setOutputKeyClass(Text.class);
46        job.setOutputValueClass(IntWritable.class);
47        FileInputFormat.addInputPath(job, new Path(args[0]));
48        FileOutputFormat.setOutputPath(job, new Path(args[1]));
49        //FileInputFormat.addInputPath(job, new Path(INPUT_PATH));addInputPaths多路径50        //FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
51        System.exit(job.waitForCompletion(true) ? 0 : 1);
52    }
53 }

[hadoop@h201 mapreduce]$ /usr/jdk1.7.0_25/bin/javac WordCount2.java
Note: WordCount2.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
[hadoop@h201 mapreduce]$ ls
counttext.txt WordCount2.class WordCount2.java WordCount2$WordCount2Mapper.class WordCount2$WordCount2Reducer.class
[hadoop@h201 mapreduce]$ /usr/jdk1.7.0_25/bin/jar cvf wc1.jar WordCount2*class
added manifest
adding: WordCount2.class(in = 1531) (out= 815)(deflated 46%)
adding: WordCount2$WordCount2Mapper.class(in = 1831) (out= 783)(deflated 57%)
adding: WordCount2$WordCount2Reducer.class(in = 1623) (out= 670)(deflated 58%)
[hadoop@h201 mapreduce]$ ls
counttext.txt wc1.jar WordCount2.class WordCount2.java WordCount2$WordCount2Mapper.class WordCount2$WordCount2Reducer.class
[hadoop@h201 mapreduce]$ hadoop jar wc1.jar WordCount2 hdfs://h201:9000/user/hadoop/counttext.txt hdfs://h201:9000/user/hadoop/output
18/03/09 23:33:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
18/03/09 23:33:39 INFO client.RMProxy: Connecting to ResourceManager at h201/192.168.121.132:8032
18/03/09 23:33:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/03/09 23:34:05 INFO input.FileInputFormat: Total input paths to process : 1
18/03/09 23:34:06 INFO mapreduce.JobSubmitter: number of splits:1
18/03/09 23:34:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/03/09 23:34:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516635595760_0001
18/03/09 23:34:21 INFO impl.YarnClientImpl: Submitted application application_1516635595760_0001
18/03/09 23:34:21 INFO mapreduce.Job: The url to track the job: http://h201:8088/proxy/application_1516635595760_0001/
18/03/09 23:34:21 INFO mapreduce.Job: Running job: job_1516635595760_0001
18/03/09 23:35:32 INFO mapreduce.Job: Job job_1516635595760_0001 running in uber mode : false
18/03/09 23:35:32 INFO mapreduce.Job: map 0% reduce 0%
18/03/09 23:36:33 INFO mapreduce.Job: map 100% reduce 0%
18/03/09 23:36:45 INFO mapreduce.Job: map 100% reduce 100%
18/03/09 23:36:47 INFO mapreduce.Job: Job job_1516635595760_0001 completed successfully
18/03/09 23:36:47 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=1366
                FILE: Number of bytes written=221143
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=747
                HDFS: Number of bytes written=101
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=55286
                Total time spent by all reduces in occupied slots (ms)=8704
                Total time spent by all map tasks (ms)=55286
                Total time spent by all reduce tasks (ms)=8704
                Total vcore-seconds taken by all map tasks=55286
                Total vcore-seconds taken by all reduce tasks=8704
                Total megabyte-seconds taken by all map tasks=56612864
                Total megabyte-seconds taken by all reduce tasks=8912896
        Map-Reduce Framework
                Map input records=50
                Map output records=120
                Map output bytes=1120
                Map output materialized bytes=1366
                Input split bytes=107
                Combine input records=0
                Combine output records=0
                Reduce input groups=13
                Reduce shuffle bytes=1366
                Reduce input records=120
                Reduce output records=13
                Spilled Records=240
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=1264
                CPU time spent (ms)=4210
                Physical memory (bytes) snapshot=223772672
                Virtual memory (bytes) snapshot=2148155392
                Total committed heap usage (bytes)=136712192
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=640
        File Output Format Counters
                Bytes Written=101
[hadoop@h201 mapreduce]$ hadoop fs -lsr /user/hadoop/output
lsr: DEPRECATED: Please use ‘ls -R’ instead.
18/03/09 23:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
-rw-r–r–   2 hadoop supergroup          0 2018-03-09 23:36 /user/hadoop/output/_SUCCESS
-rw-r–r–   2 hadoop supergroup        101 2018-03-09 23:36 /user/hadoop/output/part-r-00000
[hadoop@h201 mapreduce]$ hadoop fs -cat /user/hadoop/output/part-r-00000
18/03/09 23:39:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
ayi     10
baba    10
cai     5
didi    10
gege    10
hello   15
jiejie 15
mama    20
meimei 5
shushu 5
wei     5
wen     5
word    5

    原文作者：MapReduce
    原文地址: https://www.cnblogs.com/jieran/p/8537012.html
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。