编写map函数
wordcount_mapper.py
#!/usr/bin/env python
# ---------------------------------------------------------------
#This mapper code will input a line of text and output <word, 1>
#
# ---------------------------------------------------------------
import sys
for line in sys.stdin:
line = line.strip()
keys = line.split()
for key in keys:
value = 1
print('{0}\t{1}'.format(key, value) ) #the {} is replaced by 0th,1st items in format list
reduce函数
word count_reducer.py
#!/usr/bin/env python
# ---------------------------------------------------------------
#This reducer code will input a line of text and
# output <word, total-count>
# ---------------------------------------------------------------
import sys
last_key = None
running_total = 0
# -----------------------------------
# 使用循环读取输入并计数
# --------------------------------
for input_line in sys.stdin:
input_line = input_line.strip()
this_key, value = input_line.split("\t", 1)
value = int(value)
if last_key == this_key:
running_total += value # add value to running total
else:
if last_key:
print( "{0}\t{1}".format(last_key, running_total) )
running_total = value #reset values
last_key = this_key
if last_key == this_key:
print( "{0}\t{1}".format(last_key, running_total))
```
如果你是Yarn的话,需要另外下载streaming的jar包[参考地址](http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-streaming/2.7.3). input 事先准备一些文件。
streaming 使用绝对地址,output 不能是已经存在的目录 mapper 和reducer使用绝对地址
hadoop jar /Download/hadoop-streaming-2.7.3.jar
-input /hello \
-output /output
-mapper /usr/local/yarn/hadoop-2.7.3/wordcount/wordcount_mapper.py
-reducer /usr/local/yarn/hadoop-2.7.3/wordcount/wordcount_reducer.py
然后查看/output就可以看到结果。