hadoop - 使用 Apache Spark 安装 Hive

标签 hadoop apache-spark hive pyspark apache-spark-sql

我试图在 Spark 中执行以下查询:

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")

但这导致了错误:

File "<stdin>", line 1, in <module>

File "/home/hduser/Software/spark/python/pyspark/sql/context.py", line 502, in sql
return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
 File "/home/hduser/Software/spark/python/pyspark/sql/context.py", line 610, in _ssql_ctx
"build/sbt assembly", e)
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JError(u'Trying to call a package.',))

我试图在 Spark 上运行一个 Hive 查询。是否有必要使用 Hive 构建 Spark,我已经在我的系统上独立安装了 Spark 和 Hive,有没有一种方法可以在 spark 上运行 Hive 查询而无需使用我现有的配置构建 spark。

提前致谢

完整的日志文件如下:

16/01/07 02:50:24 DEBUG PythonGatewayServer: Started PythonGatewayServer on port 53473
16/01/07 02:50:24 DEBUG PythonGatewayServer: Communicating GatewayServer port to Python driver at 127.0.0.1:48570
16/01/07 02:50:24 INFO SparkContext: Running Spark version 1.4.1
16/01/07 02:50:24 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of successful kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops)
16/01/07 02:50:24 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of failed kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops)
16/01/07 02:50:24 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[GetGroups], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops)
16/01/07 02:50:24 DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics
16/01/07 02:50:24 DEBUG Shell: Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
    at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:303)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:328)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:610)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:272)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:790)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:760)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:633)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2162)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:301)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)
16/01/07 02:50:24 DEBUG Shell: setsid exited with exit code 0
16/01/07 02:50:24 DEBUG KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
16/01/07 02:50:24 DEBUG Groups:  Creating new Groups object
16/01/07 02:50:24 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library...
16/01/07 02:50:24 DEBUG NativeCodeLoader: Loaded the native-hadoop library
16/01/07 02:50:24 DEBUG JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
16/01/07 02:50:24 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
16/01/07 02:50:24 DEBUG Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
16/01/07 02:50:24 DEBUG UserGroupInformation: hadoop login
16/01/07 02:50:24 DEBUG UserGroupInformation: hadoop login commit
16/01/07 02:50:24 DEBUG UserGroupInformation: using local user:UnixPrincipal: hduser
16/01/07 02:50:24 DEBUG UserGroupInformation: Using user: "UnixPrincipal: hduser" with name hduser
16/01/07 02:50:24 DEBUG UserGroupInformation: User entry: "hduser"
16/01/07 02:50:24 DEBUG UserGroupInformation: UGI loginUser:hduser (auth:SIMPLE)
16/01/07 02:50:24 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to '/home/hduser/mysql-connector-java-5.1.36-bin.jar').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

16/01/07 02:50:24 WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hduser/mysql-connector-java-5.1.36-bin.jar' as a work-around.
16/01/07 02:50:24 WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hduser/mysql-connector-java-5.1.36-bin.jar' as a work-around.
16/01/07 02:50:24 WARN Utils: Your hostname, desktop1 resolves to a loopback address: 127.0.1.1; using 192.168.1.101 instead (on interface wlan0)
16/01/07 02:50:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/01/07 02:50:25 INFO SecurityManager: Changing view acls to: hduser
16/01/07 02:50:25 INFO SecurityManager: Changing modify acls to: hduser
16/01/07 02:50:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser); users with modify permissions: Set(hduser)
16/01/07 02:50:25 DEBUG SSLOptions: No SSL protocol specified
16/01/07 02:50:25 DEBUG SSLOptions: No SSL protocol specified
16/01/07 02:50:25 DEBUG SSLOptions: No SSL protocol specified
16/01/07 02:50:25 DEBUG SecurityManager: SSLConfiguration for file server: SSLOptions{enabled=false, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}
16/01/07 02:50:25 DEBUG SecurityManager: SSLConfiguration for Akka: SSLOptions{enabled=false, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}
16/01/07 02:50:25 DEBUG AkkaUtils: In createActorSystem, requireCookie is: off
16/01/07 02:50:25 INFO Slf4jLogger: Slf4jLogger started
16/01/07 02:50:25 INFO Remoting: Starting remoting
16/01/07 02:50:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.101:36696]
16/01/07 02:50:26 INFO Utils: Successfully started service 'sparkDriver' on port 36696.
16/01/07 02:50:26 DEBUG SparkEnv: Using serializer: class org.apache.spark.serializer.JavaSerializer
16/01/07 02:50:26 INFO SparkEnv: Registering MapOutputTracker
16/01/07 02:50:26 INFO SparkEnv: Registering BlockManagerMaster
16/01/07 02:50:26 INFO DiskBlockManager: Created local directory at /tmp/spark-10be872e-6114-4f74-9546-7ea87fd03425/blockmgr-adcc8ff0-29d5-4168-904b-38f822d38186
16/01/07 02:50:26 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/01/07 02:50:26 INFO HttpFileServer: HTTP File server directory is /tmp/spark-10be872e-6114-4f74-9546-7ea87fd03425/httpd-1112dc78-2447-4bb9-86f6-3c2c725b0951
16/01/07 02:50:26 INFO HttpServer: Starting HTTP Server
16/01/07 02:50:26 DEBUG HttpServer: HttpServer is not using security
16/01/07 02:50:26 INFO Utils: Successfully started service 'HTTP file server' on port 42190.
16/01/07 02:50:26 DEBUG HttpFileServer: HTTP file server started at: http://192.168.1.101:42190
16/01/07 02:50:26 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/07 02:50:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/07 02:50:26 INFO SparkUI: Started SparkUI at http://192.168.1.101:4040
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(ExpireDeadHosts,false) from Actor[akka://sparkDriver/deadLetters]
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(ExpireDeadHosts,false)
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (5.176845 ms) AkkaMessage(ExpireDeadHosts,false) from Actor[akka://sparkDriver/deadLetters]
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(TaskSchedulerIsSet,false) from Actor[akka://sparkDriver/deadLetters]
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(TaskSchedulerIsSet,false)
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.232258 ms) AkkaMessage(TaskSchedulerIsSet,false) from Actor[akka://sparkDriver/deadLetters]
16/01/07 02:50:26 INFO Executor: Starting executor ID driver on host localhost
16/01/07 02:50:26 DEBUG InternalLoggerFactory: Using SLF4J as the default logging framework
16/01/07 02:50:26 DEBUG PlatformDependent0: java.nio.Buffer.address: available
16/01/07 02:50:26 DEBUG PlatformDependent0: sun.misc.Unsafe.theUnsafe: available
16/01/07 02:50:26 DEBUG PlatformDependent0: sun.misc.Unsafe.copyMemory: available
16/01/07 02:50:26 DEBUG PlatformDependent0: java.nio.Bits.unaligned: true
16/01/07 02:50:26 DEBUG PlatformDependent: UID: 1001
16/01/07 02:50:26 DEBUG PlatformDependent: Java version: 7
16/01/07 02:50:26 DEBUG PlatformDependent: -Dio.netty.noUnsafe: false
16/01/07 02:50:26 DEBUG PlatformDependent: sun.misc.Unsafe: available
16/01/07 02:50:26 DEBUG PlatformDependent: -Dio.netty.noJavassist: false
16/01/07 02:50:26 DEBUG PlatformDependent: Javassist: unavailable
16/01/07 02:50:26 DEBUG PlatformDependent: You don't have Javassist in your class path or you don't have enough permission to load dynamically generated classes.  Please check the configuration for better performance.
16/01/07 02:50:26 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
16/01/07 02:50:26 DEBUG PlatformDependent: -Dio.netty.bitMode: 64 (sun.arch.data.model)
16/01/07 02:50:26 DEBUG PlatformDependent: -Dio.netty.noPreferDirect: false
16/01/07 02:50:26 DEBUG MultithreadEventLoopGroup: -Dio.netty.eventLoopThreads: 8
16/01/07 02:50:26 DEBUG NioEventLoop: -Dio.netty.noKeySetOptimization: false
16/01/07 02:50:26 DEBUG NioEventLoop: -Dio.netty.selectorAutoRebuildThreshold: 512
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numHeapArenas: 4
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numDirectArenas: 4
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.pageSize: 8192
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxOrder: 11
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.chunkSize: 16777216
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.tinyCacheSize: 512
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.smallCacheSize: 256
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.normalCacheSize: 64
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxCachedBufferCapacity: 32768
16/01/07 02:50:26 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.cacheTrimInterval: 8192
16/01/07 02:50:26 DEBUG ThreadLocalRandom: -Dio.netty.initialSeedUniquifier: 0x09852a335e6ac767 (took 0 ms)
16/01/07 02:50:26 DEBUG ByteBufUtil: -Dio.netty.allocator.type: unpooled
16/01/07 02:50:26 DEBUG ByteBufUtil: -Dio.netty.threadLocalDirectBufferSize: 65536
16/01/07 02:50:26 DEBUG NetUtil: Loopback interface: lo (lo, 0:0:0:0:0:0:0:1%1)
16/01/07 02:50:26 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
16/01/07 02:50:26 DEBUG TransportServer: Shuffle server started on port :40229
16/01/07 02:50:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40229.
16/01/07 02:50:26 INFO NettyBlockTransferService: Server created on 40229
16/01/07 02:50:26 INFO BlockManagerMaster: Trying to register BlockManager
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(RegisterBlockManager(BlockManagerId(driver, localhost, 40229),278302556,AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/BlockManagerEndpoint1#-2004000522])),true) from Actor[akka://sparkDriver/temp/$a]
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(RegisterBlockManager(BlockManagerId(driver, localhost, 40229),278302556,AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/BlockManagerEndpoint1#-2004000522])),true)
16/01/07 02:50:26 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40229 with 265.4 MB RAM, BlockManagerId(driver, localhost, 40229)
16/01/07 02:50:26 INFO BlockManagerMaster: Registered BlockManager
16/01/07 02:50:26 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (10.023007 ms) AkkaMessage(RegisterBlockManager(BlockManagerId(driver, localhost, 40229),278302556,AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/BlockManagerEndpoint1#-2004000522])),true) from Actor[akka://sparkDriver/temp/$a]
16/01/07 02:50:43 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@25ad7569,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$b]
16/01/07 02:50:43 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@25ad7569,BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:50:43 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (2.348068 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@25ad7569,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$b]
16/01/07 02:50:43 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$c]
16/01/07 02:50:43 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:50:43 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (6.460111 ms) AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$c]
16/01/07 02:50:53 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@1bd046bb,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$d]
16/01/07 02:50:53 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@1bd046bb,BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:50:53 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (1.269133 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@1bd046bb,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$d]
16/01/07 02:50:53 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$e]
16/01/07 02:50:53 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:50:53 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (1.066398 ms) AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$e]
16/01/07 02:51:03 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@47c2b6e3,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$f]
16/01/07 02:51:03 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@47c2b6e3,BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:51:03 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.933259 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@47c2b6e3,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$f]
16/01/07 02:51:03 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$g]
16/01/07 02:51:03 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:51:03 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (1.05242 ms) AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$g]
16/01/07 02:51:13 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@407515e3,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$h]
16/01/07 02:51:13 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@407515e3,BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:51:13 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (1.032808 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@407515e3,BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$h]
16/01/07 02:51:13 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$i]
16/01/07 02:51:13 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true)
16/01/07 02:51:13 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (1.713425 ms) AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 40229)),true) from Actor[akka://sparkDriver/temp/$i]

最佳答案

在本地构建具有 Hive 支持的 Spark 很简单,但它开箱即用地支持 Hive(因为 Hive 引入了大量依赖项)。这是我通常用来从源代码构建 Spark 以包含 Hive 支持的命令行:

./make-distribution.sh --name spark-hive-1.5.2 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver

显然,我正在构建 Spark 1.5.2,但以上内容应该适用于任何版本。 --name 参数只允许我命名最终构建的发行版。

另请参阅:https://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support

请注意,您可以从 spark.apache.org 下载的版本不支持 Hive(如上述 URL 中所述)。

关于hadoop - 使用 Apache Spark 安装 Hive,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34636046/

相关文章:

bash - 通过Oozie Shell脚本运行配置单元查询时未找到表异常

java - hive 脚本问题

rest - 如何列出 hbase 表中的所有行键?

apache-spark - 如何在具有不同内存和核心数量的集群上调整 spark 作业

apache-spark - 为什么 YARN 上的 Spark 应用程序因连接被拒绝而失败并出现 FetchFailedException?

apache-spark - 更改 Spark 数据帧分区写入的路径

apache-spark - 您如何使用PySpark解析来自现有临时表的json字符串?

具有 Hive 流的 Python 2.7 模块

python - 使用Python从HDFS目录中读取文件并在Spark中创建RDD

hadoop - 是否可以将一个 oozie 操作的输出用于其他操作而无需从 hdfs 保存和加载?子工作流选项在这方面有用吗?