hadoop - 错误 spark-shell,回退到在 SPARK_HOME 下上传库

标签 hadoop apache-spark pyspark apache-spark-sql amazon-emr

我正在尝试连接一个 spark-shell amazon hadoop,但我总是出现以下错误并且不知道如何修复它或配置缺少的内容。

spark.yarn.jars, spark.yarn.archive

spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/12 07:47:26 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
16/08/12 07:47:28 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

谢谢!!!

错误 1

我正在尝试运行一个 SQL 查询,一些非常简单的事情:

val sqlDF = spark.sql("SELECT col1 FROM tabl1 limit 10")
sqlDF.show()

WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

错误2

然后我尝试运行脚本 scala,一些简单的东西收集在: https://blogs.aws.amazon.com/bigdata/post/Tx2D93GZRHU3TES/Using-Spark-SQL-for-ETL

import org.apache.hadoop.io.Text;
import org.apache.hadoop.dynamodb.DynamoDBItemWritable
import com.amazonaws.services.dynamodbv2.model.AttributeValue
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.io.LongWritable
import java.util.HashMap


var ddbConf = new JobConf(sc.hadoopConfiguration)
ddbConf.set("dynamodb.output.tableName", "tableDynamoDB")
ddbConf.set("dynamodb.throughput.write.percent", "0.5")
ddbConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
ddbConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")


var genreRatingsCount = sqlContext.sql("SELECT col1 FROM table1 LIMIT 1")

var ddbInsertFormattedRDD = genreRatingsCount.map(a => {
var ddbMap = new HashMap[String, AttributeValue]()

var col1 = new AttributeValue()
col1.setS(a.get(0).toString)
ddbMap.put("col1", col1)

var item = new DynamoDBItemWritable()
item.setItem(ddbMap)

(new Text(""), item)
}
)

ddbInsertFormattedRDD.saveAsHadoopDataset(ddbConf)

scala.reflect.internal.Symbols$CyclicReference: illegal cyclic reference involving object InterfaceAudience at scala.reflect.internal.Symbols$Symbol$$anonfun$info$3.apply(Symbols.scala:1502) at scala.reflect.internal.Symbols$Symbol$$anonfun$info$3.apply(Symbols.scala:1500) at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

最佳答案

看起来 spark UI 没有启动,尝试启动 spark shell 并检查 sparkUI localhost:4040 是否正确运行。

关于hadoop - 错误 spark-shell,回退到在 SPARK_HOME 下上传库,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38912706/

相关文章:

scala - 使用列名数组聚合 Spark 数据框,保留名称

scala - 如何从 Spark ML 随机森林中获取对应于类的概率

apache-spark - num-executors 可以覆盖 Spark-Submit 中的动态分配吗

apache-spark - 读取过去 2 小时在 Pyspark 中创建的 parquet 文件

eclipse - 如何构建 Hadoop eclipse 插件

hadoop - 将 rdd 从 spark 写入 Elastic Search 失败

python - pyspark。生成随机数的转换器始终生成相同的数字

apache-spark - 如何在pyspark中将列表列表合并为单个列表

java - 写hadoop的java api需要SSH吗?

hadoop - 如何为 PIG 或 HIVE 中的行添加行号?