hadoop - [SPARK]:java.lang.IllegalArgumentException:java.net.UnknownHostException:水管工

标签 hadoop apache-spark

我构建了一个Spark Streaming应用程序,该应用程序从套接字读取数据并进行计算,然后将结果写入hdfs。但是应用程序在A hadoop群集中运行,而hdfs在B hadoop群集中运行。下面是我的代码:

if (args.length < 2) {
  System.out.println("Usage: StreamingWriteHdfs hostname port")
  System.exit(-1)
}

val conf = new SparkConf()
conf.setAppName("StreamingWriteHdfs")

val ssc = new StreamingContext(conf, Durations.seconds(10))
ssc.checkpoint("/tmp")

val hostname: String = args(0)
val port :Int = Integer.parseInt(args(1))

val lines = ssc.socketTextStream(hostname, port)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

wordCounts.print()

//TODO write to hdfs
wordCounts.saveAsHadoopFiles("hdfs://plumber/tmp/test/streaming",
  "out",
  classOf[Text],
  classOf[IntWritable],
  classOf[TextOutputFormat[Text, IntWritable]])

ssc.start()
ssc.awaitTermination()

在A集群中运行此应用程序时,请执行以下操作:
java.lang.IllegalArgumentException: java.net.UnknownHostException:plumber
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: plumber

B hadoop群集的fs.defaultFS是 hdfs:// plumber

有人可以帮助我!谢谢。

最佳答案

我认为您需要修改主机名,例如

"hdfs://plumber:8020/tmp/test/streaming".

关于hadoop - [SPARK]:java.lang.IllegalArgumentException:java.net.UnknownHostException:水管工,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47406052/

相关文章:

hadoop - 带有位置的 Hive 外部表

java - MacOS 无法为您的平台加载 native hadoop 库...在适用的情况下使用内置 java 类

在 apache Spark 中加入列时,Java 相当于 scala 的 concat

java - EMR Spark java 应用程序 GC 问题

java - Spark java.lang.StackOverflowError

java - 如何运行 Spark Java 程序

hadoop - 将数据传入和传出 Elastic MapReduce HDFS

hadoop - 解压 Hadoop hdfs 目录中的所有 Gzip 文件

java - Spark Combinebykey JAVA lambda 表达式

hadoop - Hadoop Zookeeper理解