安装好hdfs和hive后,hive启动bin/hive –service metastore 。默认监听9083端口。
在intellij新建scala工程
添加spark、hive依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.major.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.major.version}</artifactId>
<version>${spark.version}</version>
</dependency>
拷贝hive-site.xml到resources下面。修改metastore的uri。
<property>
<name>hive.metastore.uris</name>
<value>thrift://gpu1:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
下面是spark代码
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object HiveTest {
def main(arr: Array[String]) {
System.setProperty("HADOOP_USER_NAME","root")
val conf = new SparkConf().setAppName("HiveTest")
.setMaster("spark://gpu1:7077")
.set("spark.cores.max","1")
.setJars(List("target/sparkTest.jar"))
val hive=SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
hive.sql("show tables").show()
val u_data=hive.sql("select * from ratings")
u_data.groupBy("userid").count().collect().foreach(println)
hive.stop()
}
}
注意,运行前先把工程编译生成jar,同时在代码中指定该文件。