使用Hadoop Streaming运行Python版Wordcount

编写map函数
wordcount_mapper.py

#!/usr/bin/env python   

# ---------------------------------------------------------------
#This mapper code will input a line of text and output <word, 1>
# 
# ---------------------------------------------------------------

import sys            

for line in sys.stdin:  
    line = line.strip()  
    keys = line.split() 
    for key in keys:    
        value = 1        
        print('{0}\t{1}'.format(key, value) ) #the {} is replaced by 0th,1st items in format list
                       

reduce函数
word count_reducer.py

#!/usr/bin/env python

# ---------------------------------------------------------------
#This reducer code will input a line of text and 
#    output <word, total-count>
# ---------------------------------------------------------------
import sys

last_key      = None              
running_total = 0

# -----------------------------------
# 使用循环读取输入并计数
#  --------------------------------
for input_line in sys.stdin:
    input_line = input_line.strip()
    this_key, value = input_line.split("\t", 1) 
    value = int(value)           
 
    if last_key == this_key:     
        running_total += value   # add value to running total

    else:
        if last_key:          
            print( "{0}\t{1}".format(last_key, running_total) )
                               
        running_total = value    #reset values
        last_key = this_key

if last_key == this_key:
    print( "{0}\t{1}".format(last_key, running_total)) 

    ```


如果你是Yarn的话,需要另外下载streaming的jar包[参考地址](http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-streaming/2.7.3). input 事先准备一些文件。

streaming 使用绝对地址,output 不能是已经存在的目录 mapper 和reducer使用绝对地址

hadoop jar /Download/hadoop-streaming-2.7.3.jar
-input /hello \
-output /output
-mapper /usr/local/yarn/hadoop-2.7.3/wordcount/wordcount_mapper.py
-reducer /usr/local/yarn/hadoop-2.7.3/wordcount/wordcount_reducer.py

然后查看/output就可以看到结果。
    原文作者:苟雨
    原文地址: https://www.jianshu.com/p/e3fba578d1a8
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞