我构建了一个Spark Streaming应用程序,该应用程序从套接字读取数据并进行计算,然后将结果写入hdfs。但是应用程序在A hadoop群集中运行,而hdfs在B hadoop群集中运行。下面是我的代码:
if (args.length < 2) {
System.out.println("Usage: StreamingWriteHdfs hostname port")
System.exit(-1)
}
val conf = new SparkConf()
conf.setAppName("StreamingWriteHdfs")
val ssc = new StreamingContext(conf, Durations.seconds(10))
ssc.checkpoint("/tmp")
val hostname: String = args(0)
val port :Int = Integer.parseInt(args(1))
val lines = ssc.socketTextStream(hostname, port)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.print()
//TODO write to hdfs
wordCounts.saveAsHadoopFiles("hdfs://plumber/tmp/test/streaming",
"out",
classOf[Text],
classOf[IntWritable],
classOf[TextOutputFormat[Text, IntWritable]])
ssc.start()
ssc.awaitTermination()
在A集群中运行此应用程序时,请执行以下操作:
java.lang.IllegalArgumentException: java.net.UnknownHostException:plumber
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: plumber
B hadoop群集的fs.defaultFS是 hdfs:// plumber 。
有人可以帮助我!谢谢。
最佳答案
我认为您需要修改主机名,例如
"hdfs://plumber:8020/tmp/test/streaming".
关于hadoop - [SPARK]:java.lang.IllegalArgumentException:java.net.UnknownHostException:水管工,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47406052/