Spark 之访问 Hive 空指针异常(Windows)

背景

首次编写 Spark 访问 Hive 代码,初始化 Hive 实例时报错,Spark 访问 HDFS 文件正常,Spark 基于RDD 开发正常。

报错信息如下:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/05/03 19:44:08 INFO SparkContext: Running Spark version 2.1.1
18/05/03 19:44:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/03 19:44:08 INFO SecurityManager: Changing view acls to: zhou.pengbo
18/05/03 19:44:08 INFO SecurityManager: Changing modify acls to: zhou.pengbo
18/05/03 19:44:08 INFO SecurityManager: Changing view acls groups to: 
18/05/03 19:44:08 INFO SecurityManager: Changing modify acls groups to: 
18/05/03 19:44:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zhou.pengbo); groups with view permissions: Set(); users  with modify permissions: Set(zhou.pengbo); groups with modify permissions: Set()
18/05/03 19:44:09 INFO Utils: Successfully started service 'sparkDriver' on port 50189.
18/05/03 19:44:09 INFO SparkEnv: Registering MapOutputTracker
18/05/03 19:44:09 INFO SparkEnv: Registering BlockManagerMaster
18/05/03 19:44:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/05/03 19:44:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/05/03 19:44:09 INFO DiskBlockManager: Created local directory at C:\Users\zhou.pengbo\AppData\Local\Temp\blockmgr-f382f8ce-3e2b-4afd-ae90-db414b57329b
18/05/03 19:44:09 INFO MemoryStore: MemoryStore started with capacity 899.7 MB
18/05/03 19:44:09 INFO SparkEnv: Registering OutputCommitCoordinator
18/05/03 19:44:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/05/03 19:44:10 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.20.2.243:4040
18/05/03 19:44:10 INFO Executor: Starting executor ID driver on host localhost
18/05/03 19:44:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50198.
18/05/03 19:44:10 INFO NettyBlockTransferService: Server created on 10.20.2.243:50198
18/05/03 19:44:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/05/03 19:44:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.20.2.243, 50198, None)
18/05/03 19:44:10 INFO BlockManagerMasterEndpoint: Registering block manager 10.20.2.243:50198 with 899.7 MB RAM, BlockManagerId(driver, 10.20.2.243, 50198, None)
18/05/03 19:44:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.20.2.243, 50198, None)
18/05/03 19:44:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.20.2.243, 50198, None)
18/05/03 19:44:10 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
18/05/03 19:44:10 INFO SharedState: Warehouse path is '/user/hive/warehouse'.
18/05/03 19:44:11 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/05/03 19:44:11 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
18/05/03 19:44:11 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/05/03 19:44:11 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
18/05/03 19:44:11 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
18/05/03 19:44:11 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
18/05/03 19:44:11 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
18/05/03 19:44:11 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
18/05/03 19:44:11 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
18/05/03 19:44:11 INFO metastore: Trying to connect to metastore with URI thrift://yq-hadoop19:9083
18/05/03 19:44:11 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
    at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
    at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
    at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
    at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
    at AddIndexPlatformFields$.main(AddIndexPlatformFields.scala:17)
    at AddIndexPlatformFields.main(AddIndexPlatformFields.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
    ... 42 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
    ... 48 more
Caused by: java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
    at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
    at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
    at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
    at org.apache.hadoop.security.Groups.getGroups(Groups.java:89)
    at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1352)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:436)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
    ... 53 more
18/05/03 19:44:11 INFO metastore: Trying to connect to metastore with URI thrift://yq-hadoop19:9083
Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
    at AddIndexPlatformFields$.main(AddIndexPlatformFields.scala:17)
    at AddIndexPlatformFields.main(AddIndexPlatformFields.scala)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
    ... 12 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
    at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
    ... 17 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
    ... 25 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
    at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
    ... 30 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
    ... 38 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
    ... 39 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
    ... 45 more
Caused by: java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
    at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
    at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
    at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
    at org.apache.hadoop.security.Groups.getGroups(Groups.java:89)
    at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1352)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:436)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
    ... 50 more
18/05/03 19:44:11 INFO SparkContext: Invoking stop() from shutdown hook
18/05/03 19:44:11 INFO SparkUI: Stopped Spark web UI at http://10.20.2.243:4040
18/05/03 19:44:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/05/03 19:44:11 INFO MemoryStore: MemoryStore cleared
18/05/03 19:44:11 INFO BlockManager: BlockManager stopped
18/05/03 19:44:11 INFO BlockManagerMaster: BlockManagerMaster stopped
18/05/03 19:44:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/05/03 19:44:11 INFO SparkContext: Successfully stopped SparkContext
18/05/03 19:44:11 INFO ShutdownHookManager: Shutdown hook called
18/05/03 19:44:11 INFO ShutdownHookManager: Deleting directory C:\Users\zhou.pengbo\AppData\Local\Temp\spark-6a20aec2-3765-4d72-aa68-97431c86225b
示例代码如下:
import org.apache.spark.sql.SparkSession

/**
  * create by zhou.pengbo
  * create date : 2018/05/03
  */

object Test {

  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
      .builder()
      .appName("test")
      .master("local")
      .enableHiveSupport()
      .getOrCreate()

    spark.sql("SELECT * FROM data_tmp.app_bbs_similar_user LIMIT 10").show(2)
  }
}
build.sbt 文件如下:
name := "sbt_dw_app"

version := "0.1"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.1"
且 hive-site.xml 已放置 Resources Root 目录下,如下图:

《Spark 之访问 Hive 空指针异常(Windows)》

原因

缺少 winutils.exe文件。
PS:Spark需要用到Hadoop中的一些类库&使用winutils.exe文件初始化Hive的上下文

解决

下载winutils.exe文件。
这并不意味这我们一定要安装Hadoop,我们可以直接下载所需要的winutils.exe到磁盘上的任何位置,比如C:\winutils\bin\winutils.exe,同时设置 HADOOP_HOME=C:winutils

附 winutils.exe下载地址:winutils.exe

注意

A: Spark处理Hives数据需要作业的执行模式为yarn-client(local也行),不能为yarn-cluster,否则会报上述异常。
B: winutils.exe 环境变量配置完毕之后需要重启idea,否则不生效(坑)
C: 缺少此文件不影响正常读写HDFS文件,访问Hbase,RDD编程

参考

[1] Windows 平台运行spark-shell 报”java.lang.NullPointerException, not found: value sqlContext” error 解决办法
[2] Windows上搭建Standalone模式的Spark环境

    原文作者:步闲
    原文地址: https://www.jianshu.com/p/202c6d6d51b0
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞