scala - 无法使用 Maven 项目从 Eclipse 通过 HiveContext 访问配置单元表

标签 scala hadoop apache-spark hive hdfs

这个问题在这里已经有了答案:





The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: -wx------

(4 个回答)


4年前关闭。




我正在尝试从具有scala性质的eclipse maven项目访问hive表。

我尝试使用配置单元上下文来获取配置单元数据库详细信息,如下所示,但面临以下错误。
我可以在 spark-shell CLI 中执行以下代码,但无法在 eclipse scala ide 添加 maven 依赖项中执行相同的操作。

下面是我的代码:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._

object readHiveTable {
  def main(args: Array[String]){
    val conf = new SparkConf().setAppName("Read Hive Table").setMaster("local")
    conf.set("spark.ui.port","4041")
    val sc = new SparkContext(conf)
    //val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val hc = new HiveContext(sc)
    hc.setConf("hive.metastore.uris","thrift://127.0.0.1:9083")
    hc.sql("use default")
    val a = hc.sql("show tables")
    a.show
  }
}

以下是我在控制台窗口中遇到的错误:
18/02/04 19:58:15 INFO SparkUI: Started SparkUI at http://192.168.0.10:4041
18/02/04 19:58:15 INFO Executor: Starting executor ID driver on host localhost
18/02/04 19:58:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36099.
18/02/04 19:58:15 INFO NettyBlockTransferService: Server created on 36099
18/02/04 19:58:15 INFO BlockManagerMaster: Trying to register BlockManager
18/02/04 19:58:15 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36099 with 744.4 MB RAM, BlockManagerId(driver, localhost, 36099)
18/02/04 19:58:15 INFO BlockManagerMaster: Registered BlockManager
18/02/04 19:58:17 INFO HiveContext: Initializing execution hive, version 1.2.1
18/02/04 19:58:17 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
18/02/04 19:58:17 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
18/02/04 19:58:17 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/02/04 19:58:17 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
18/02/04 19:58:17 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
18/02/04 19:58:17 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/02/04 19:58:17 INFO ObjectStore: ObjectStore, initialize called
18/02/04 19:58:17 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/02/04 19:58:17 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/02/04 19:58:28 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:39 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/02/04 19:58:39 INFO ObjectStore: Initialized ObjectStore
18/02/04 19:58:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/02/04 19:58:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/02/04 19:58:41 INFO HiveMetaStore: Added admin role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: Added public role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_all_databases
18/02/04 19:58:41 INFO audit: ugi=chaithu   ip=unknown-ip-addr  cmd=get_all_databases   
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/02/04 19:58:41 INFO audit: ugi=chaithu   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
18/02/04 19:58:41 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
    at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
    at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
    at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
    at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
    at com.CITIGenesis.readHiveTable$.main(readHiveTable.scala:13)
    at com.CITIGenesis.readHiveTable.main(readHiveTable.scala)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
    at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 12 more
18/02/04 19:58:43 INFO SparkContext: Invoking stop() from shutdown hook
18/02/04 19:58:43 INFO SparkUI: Stopped Spark web UI at http://192.168.0.10:4041
18/02/04 19:58:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/04 19:58:43 INFO MemoryStore: MemoryStore cleared
18/02/04 19:58:43 INFO BlockManager: BlockManager stopped
18/02/04 19:58:43 INFO BlockManagerMaster: BlockManagerMaster stopped
18/02/04 19:58:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/02/04 19:58:43 INFO SparkContext: Successfully stopped SparkContext
18/02/04 19:58:43 INFO ShutdownHookManager: Shutdown hook called
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ec5892a-1d53-4721-b770-d16e8757865d
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ca97c02-57c7-400b-b552-44f6d7813da5

HDFS 目录:
chaithu@localhost:~$ hadoop fs -ls /tmp
Found 3 items
d---------   - hdfs   supergroup          0 2018-02-04 14:15 /tmp/.cloudera_health_monitoring_canary_files
drwxrwxrwx   - hdfs   supergroup          0 2018-01-31 11:42 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2018-01-31 11:25 /tmp/logs
chaithu@localhost:~$ hadoop fs -ls /user/
Found 6 items
drwxrwxrwx   - chaithu supergroup          0 2018-02-04 19:34 /user/chaithu
drwxrwxrwx   - mapred  hadoop              0 2018-01-31 11:25 /user/history
drwxrwxr-t   - hive    hive                0 2018-01-31 11:31 /user/hive
drwxrwxr-x   - hue     hue                 0 2018-01-31 11:38 /user/hue
drwxrwxr-x   - oozie   oozie               0 2018-01-31 11:34 /user/oozie
drwxr-x--x   - spark   spark               0 2018-01-31 22:39 /user/spark

最佳答案

for Hadoop version 2.2.0
假设这是 Spark 版本,您应该使用 SparkSession并使用 enableHiveSupport() ,然后 spark.sql方法将像在 spark shell 中一样运行。

Hive/SQLContext 仅用于向后兼容。新的 Spark 代码不应使用它们。
underlying DB is DERBY
对我来说,这条线意味着要么

  • Hive 正在使用默认元存储配置
  • Spark 未连接到 Metastore,并已创建本地 derby 数据库。这对应于 Failed to get database default

  • 在后一种情况下,检查本地文件系统的/tmp 文件夹

    在此处查看各种解决方案以了解如何连接到元存储

    How to connect to a Hive metastore programmatically in SparkSQL?

    关于scala - 无法使用 Maven 项目从 Eclipse 通过 HiveContext 访问配置单元表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48609188/

    相关文章:

    hadoop - 带有 Hive 问题的 Oozie

    azure - Azure 上的 Hadoop 使用 IaaS

    java - Spark : . saveAsTextFile 丢失 Java 对象的继承字段

    web-services - 并行运行 EMR 的步骤

    scala - 如何配置 IntelliJ 以使用 Homebrew 安装的 Java、Scala 和 SBT?

    java - 无法让 log4j 在游戏中运行!框架 2.2.1(scala 项目)

    严格收集与 View 的性能比较

    scala - 使用带有 onComplete 的 Scala future 列表进行异步处理以进行异常处理

    hadoop - 错误 1070 : Could not resolve toDate using imports: [, java.lang., > org.apache.pig.builtin., org.apache.pig.impl.builtin.]

    scala - 如何在 Spark Notebook 中导入库