spark入门程序 word count

本文总结了spark中的hello world—word count的开发流程。

spark 支持的开发语言有scala,java, python,下面用java语言进行word count程序开发。java在1.8版本以后支持lambda表达式,这大大缩减了开发时间。具体lambda表达式使用可以参考文章 函数式编程(一) lambda、FunctionalInterface、Method Reference

开发工具:IDEA,maven,JDK1.8

1.在IDEA中新建一个maven project(Project SDK使用java 1.8以上版本),本例项目名称叫count。
2.打开maven的pom.xml文件,在version标签下面添加如下代码,由于本文的count程序使用javaRDD,所以添加spark core即可。

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.thoughtworks.paranamer</groupId>
            <artifactId>paranamer</artifactId>
            <version>2.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.0</version>
        </dependency>
    </dependencies>

3.编写word count代码,其中 e:/word_count.txt 为待统计文字文件,程序使用local模式进行部署运行。

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;

import java.util.Arrays;
import java.util.List;

public class WordCount {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> textFile = sc.textFile("e:/word_count.txt");
        JavaPairRDD<String, Integer> counts = textFile
                .flatMap(s -> Arrays.asList(s.split(" ")).iterator())
                .mapToPair(word -> new Tuple2<>(word, 1))
                .reduceByKey((a, b) -> a + b);
        List<Tuple2<String, Integer>> countList = counts.collect();
        countList.forEach(System.out::println);
    }
}

4.运行部分结果:

(touched,1)
(voices,1)
(forming.,1)
(Because,1)
(it,1)
(its,2)
(writing,1)
(People,3)
(old,1)
(naked,1)
(Hear,1)
(Take,1)
(arms,1)
(fell,,1)
(cobblestone,,1)
(neon,2)
(you,,1)
...

测试用例使用电影《毕业生》的主题曲The sound of silence

Hello darkness, my old friend,
I’ve come to talk with you again,
Because a vision softly creeping,
Left its seeds while I was sleeping,
And the vision that was planted in my brain
Still remains
Within the sound of silence.
In restless dreams I walk alone
Narrow streets of cobblestone,
‘Neath the halo of a street lamp,
I turned my collar to the cold and damp
When my eyes were stabbed by the flash of a neon light
That split the night
And touched the sound of silence.
And in the naked light I saw
Ten thousand people, maybe more.
People talking without speaking,
People hearing without listening,
People writing songs that voices never share
And no one dared
Disturb the sound of silence.
“Fools” said I,”You do not know
Silence like a cancer grows.
Hear my words that I might teach you,
Take my arms that I might reach you.”
But my words like silent raindrops fell,
And echoed
In the wells of silence
And the people bowed and prayed
To the neon god they made.
And the sign flashed out its warning,
In the words that it was forming.
And the signs said, ‘The words of the prophets are written on the subway walls
And tenement halls.
And whisper’d in the sounds of silence.

参考图书:
Spark快速大数据分析

    原文作者:mumu_cola
    原文地址: https://www.jianshu.com/p/682e58262510
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞