intellij下spark访问hive

安装好hdfs和hive后,hive启动bin/hive –service metastore 。默认监听9083端口。

在intellij新建scala工程

添加spark、hive依赖

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_${scala.major.version}</artifactId>
    <version>${spark.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_${scala.major.version}</artifactId>
    <version>${spark.version}</version>
</dependency>

拷贝hive-site.xml到resources下面。修改metastore的uri。

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://gpu1:9083</value>
  <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

下面是spark代码

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

object HiveTest {
  def main(arr: Array[String]) {
    System.setProperty("HADOOP_USER_NAME","root")
    val conf = new SparkConf().setAppName("HiveTest")
      .setMaster("spark://gpu1:7077")
      .set("spark.cores.max","1")
      .setJars(List("target/sparkTest.jar"))

    val hive=SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
    hive.sql("show tables").show()
    val u_data=hive.sql("select * from ratings")
    u_data.groupBy("userid").count().collect().foreach(println)
    hive.stop()
  }
}

注意,运行前先把工程编译生成jar,同时在代码中指定该文件。

    原文作者:匠人的OP
    原文地址: https://zhuanlan.zhihu.com/p/50214060
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞