hadoop:如何运行自带wordcount

hadoop:如何运行自带wordcount 

转载自:http://www.aboutyun.com/thread-7713-1-1.html

1.找到examples例子
我们需要找打这个例子的位置:首先需要找到你的hadoop文件夹,然后依照下面路径:
/hadoop/share/hadoop/mapreduce会看到如下图:

  1. hadoop-mapreduce-examples-2.2.0.jar

《hadoop:如何运行自带wordcount》 

第二步:
我们需要需要做一下运行需要的工作,比如输入输出路径,上传什么文件等。
1.先在HDFS创建几个数据目录:

  1. hadoop fs -mkdir -p /data/wordcount
  2. hadoop fs -mkdir -p /output/

《hadoop:如何运行自带wordcount》 

2.目录/data/wordcount用来存放Hadoop自带的WordCount例子的数据文件,运行这个MapReduce任务的结果输出到/output/wordcount目录中。
首先新建文件inputWord:

  1. vi /usr/inputWord

新建完毕,查看内容:

  1. cat /usr/inputWord

《hadoop:如何运行自带wordcount》 

将本地文件上传到HDFS中:

  1. hadoop fs -put /usr/inputWord /data/wordcount/

可以查看上传后的文件情况,执行如下命令:

  1. hadoop fs -ls /data/wordcount

可以看到上传到HDFS中的文件。

《hadoop:如何运行自带wordcount》 

通过命令

  1. hadoop fs -text /data/wordcount/inputWord

看到如下内容:
《hadoop:如何运行自带wordcount》 
下面,运行WordCount例子,执行如下命令:

  1. hadoop jar /usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount

《hadoop:如何运行自带wordcount》 
可以看到控制台输出程序运行的信息:

aboutyun@master:~$ hadoop jar /usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount

14/05/14 10:33:33 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.77.15:8032

14/05/14 10:33:34 INFO input.FileInputFormat: Total input paths to process : 1

14/05/14 10:33:34 INFO mapreduce.JobSubmitter: number of splits:1

14/05/14 10:33:34 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

14/05/14 10:33:34 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

14/05/14 10:33:34 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name

14/05/14 10:33:34 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

14/05/14 10:33:34 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

14/05/14 10:33:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1400084979891_0004

14/05/14 10:33:36 INFO impl.YarnClientImpl: Submitted application application_1400084979891_0004 to ResourceManager at master/172.16.77.15:8032

14/05/14 10:33:36 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1400084979891_0004/

14/05/14 10:33:36 INFO mapreduce.Job: Running job: job_1400084979891_0004

14/05/14 10:33:45 INFO mapreduce.Job: Job job_1400084979891_0004 running in uber mode : false

14/05/14 10:33:45 INFO mapreduce.Job:  map 0% reduce 0%

14/05/14 10:34:10 INFO mapreduce.Job:  map 100% reduce 0%

14/05/14 10:34:19 INFO mapreduce.Job:  map 100% reduce 100%

14/05/14 10:34:19 INFO mapreduce.Job: Job job_1400084979891_0004 completed successfully

14/05/14 10:34:20 INFO mapreduce.Job: Counters: 43

        File System Counters

                FILE: Number of bytes read=81

                FILE: Number of bytes written=158693

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=175

                HDFS: Number of bytes written=51

                HDFS: Number of read operations=6

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters 

                Launched map tasks=1

                Launched reduce tasks=1

                Data-local map tasks=1

                Total time spent by all maps in occupied slots (ms)=23099

                Total time spent by all reduces in occupied slots (ms)=6768

        Map-Reduce Framework

                Map input records=5

                Map output records=10

                Map output bytes=106

                Map output materialized bytes=81

                Input split bytes=108

                Combine input records=10

                Combine output records=6

                Reduce input groups=6

                Reduce shuffle bytes=81

                Reduce input records=6

                Reduce output records=6

                Spilled Records=12

                Shuffled Maps =1

                Failed Shuffles=0

                Merged Map outputs=1

                GC time elapsed (ms)=377

                CPU time spent (ms)=11190

                Physical memory (bytes) snapshot=284524544

                Virtual memory (bytes) snapshot=2000748544

                Total committed heap usage (bytes)=136450048

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters 

                Bytes Read=67

        File Output Format Counters 

                Bytes Written=51

查看结果,执行如下命令:

  1. hadoop fs -text /output/wordcount/part-r-00000

结果数据示例如下:

  1. aboutyun@master:~$ hadoop fs -text /output/wordcount/part-r-00000
  2. aboutyun        2
  3. first        1
  4. hello        3
  5. master        1
  6. slave        2
  7. what        1

《hadoop:如何运行自带wordcount》 
登录到Web控制台,访问链接http://master:8088/可以看到任务记录情况。

 

    原文作者:MapReduce
    原文地址: https://www.cnblogs.com/xiaomila-study/p/4973662.html
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞