1、spark sql join中条件里面不能带有不确定值的表达式,例如case when,coalesce等
2、spark sql 中表的别名一定要唯一,hive是可以的,但是处于规范性要做到唯一性处理
3、启动spark-sql报错Caused by: MetaException(message:Version information not found
WARN metadata.Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
。…….……。…。……。
Caused by: java.lang.reflect.InvocationTargetException
Caused by: MetaException(message:Version information not found in metastore. )
因为版本信息原因,sqark-sql不能访问hive的metastore。回顾环境和所做过的配置,
将hive的hive-site.xml 拷贝到了spark/conf下,并在spark-env.sh中设置了$HIVE_HOME,
先前把hive从1.2.2升级到2.1.1后还正常操作过,不应该是hive本身的问题。只可能出现在spark与hive的兼容方面。
经百度和验证,定位到了hive-site.xml中的这个配置项,默认是true,即要做hive metastore schema的验证。改为false后,spark-sql启动正常,不再报错。
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn’t match with one from in Hive jars.
</description>
</property>
同时,注意到spark-sql启动后的一句日志,对应的hive版本是1.2.1,看来确实是现在使用的spark版本是基于hive 1.2.1兼容的,而环境中安装的hive是2.1.1的,确实是兼容性所致。
17/09/13 10:06:02 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is hdfs://master:9000/hive/warehouse
spark-sql>
4、