apache-spark - Spark 提交错误 :Name or service not known

标签 apache-spark pyspark spark-dataframe apache-spark-mllib

我正在使用亚马逊机器运行 pyspark 代码

code in pyspark shell:
a=open("test.txt")
s=sc.parallelize(a)
print(s.count())

由于某些问题,我无法直接使用 sc.textFile("test.txt") 。

python文件中的代码:
from pyspark import SparkContxt

sc=SparkContext()
with open("test.txt") as f:
s=sc.parallelize(f)
print(s.count())

当我尝试 spark-submit test.py 时出现错误名称或服务未知
ubuntu@10-0-0-32:~/Deepak/projects$ spark-submit test1.py
16/06/12 03:44:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/06/12 03:44:59 ERROR : 10-0-0-32: 10-0-0-32: Name or service not known
java.net.UnknownHostException: 10-0-0-32: 10-0-0-32: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1496)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:355)
    at tachyon.util.network.NetworkAddressUtils.getLocalHostName(NetworkAddressUtils.java:320)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:122)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:111)
    at tachyon.Version.<clinit>(Version.java:27)
    at tachyon.Constants.<clinit>(Constants.java:328)
    at tachyon.hadoop.AbstractTFS.<clinit>(AbstractTFS.java:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at java.lang.Class.newInstance(Class.java:383)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1362)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: 10-0-0-32: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1492)
    ... 40 more
16/06/12 03:44:59 ERROR SparkContext: Error initializing SparkContext.
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:224)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1362)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ExceptionInInitializerError
    at tachyon.Constants.<clinit>(Constants.java:328)
    at tachyon.hadoop.AbstractTFS.<clinit>(AbstractTFS.java:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at java.lang.Class.newInstance(Class.java:383)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
    ... 27 more
Caused by: java.lang.RuntimeException: java.net.UnknownHostException: 10-0-0-32: 10-0-0-32: Name or service not known
    at org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:398)
    at tachyon.util.network.NetworkAddressUtils.getLocalHostName(NetworkAddressUtils.java:320)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:122)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:111)
    at tachyon.Version.<clinit>(Version.java:27)
    ... 35 more
Caused by: java.net.UnknownHostException: 10-0-0-32: 10-0-0-32: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1496)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:355)
    ... 39 more
Caused by: java.net.UnknownHostException: 10-0-0-32: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1492)
    ... 40 more
16/06/12 03:44:59 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Traceback (most recent call last):
File "/home/ubuntu/Deepak/projects/test1.py", line 2, in <module>
sc = SparkContext("local", "test1", pyFiles=['test1.py'])
File "/home/ubuntu/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/home/ubuntu/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/home/ubuntu/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/home/ubuntu/spark-1.6.0-bin-hadoop2.4/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
File "/home/ubuntu/spark-1.6.0-bin-hadoop2.4/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:224)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1362)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ExceptionInInitializerError
    at tachyon.Constants.<clinit>(Constants.java:328)
    at tachyon.hadoop.AbstractTFS.<clinit>(AbstractTFS.java:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at java.lang.Class.newInstance(Class.java:383)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
    ... 27 more
Caused by: java.lang.RuntimeException: java.net.UnknownHostException: 10-0-0-32: 10-0-0-32: Name or service not known
    at org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:398)
    at tachyon.util.network.NetworkAddressUtils.getLocalHostName(NetworkAddressUtils.java:320)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:122)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:111)
    at tachyon.Version.<clinit>(Version.java:27)
    ... 35 more
Caused by: java.net.UnknownHostException: 10-0-0-32: 10-0-0-32: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1496)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:355)
    ... 39 more
Caused by: java.net.UnknownHostException: 10-0-0-32: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1492)
    ... 40 more

最佳答案

将主机名添加到 etc/hosts 文件

以前我是这样做的

IP ubuntu(用户名) alias_name

我改为

IP 主机名 alias_name

混淆部分在这里,因为我使用亚马逊机器我的 IP 和主机名相同。

关于apache-spark - Spark 提交错误 :Name or service not known,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37770697/

相关文章:

python - 如何将 pyspark 数据帧写入 HDFS,然后如何将其读回数据帧?

scala - 如何跨 Array[DataFrame] 组合(加入)信息

apache-spark - JDBC 到 Spark Dataframe - 如何确保均匀分区?

pyspark - 远程 RPC 客户端解除关联。可能是由于容器超过阈值或网络问题。检查驱动程序日志以获取 WARN 消息

python - 在数据框中创建字典类型列

Scala-Spark(version1.5.2) 数据帧拆分错误

apache-spark - AWS EMR 多作业依赖争用

apache-spark - Spark : how to use SparkContext. textFile 用于本地文件系统

python - Catch 子句不适用于 hive_context.read.json 函数

apache-spark - Spark RDD 中按行删除重复项