r - 如何在 SparkR 中读取 json/csv 文件?

标签 r hadoop apache-spark sparkr

我在本地模式下部署了 Spark spark-1.4.1-bin-hadoop2.6,我正在从 HDFS 读取输入 JSON 文件。但是 SparkR dataFrame 的方法 read.df 方法不能 从 HDFS 加载数据。

1)“read.df”错误信息

data <- read.df("/data/sample.json") # 从 hdfs 输入

15/09/01 18:19:38 ERROR r.RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:142)
        at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74)
        at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException: key not found: path
        at scala.collection.MapLike$class.default(MapLike.scala:228)
        at org.apache.spark.sql.sources.CaseInsensitiveMap.default(ddl.scala:467)
        at scala.collection.MapLike$class.apply(MapLike.scala:141)
        at org.apache.spark.sql.sources.CaseInsensitiveMap.apply(ddl.scala:467)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:273)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
        at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:147)
        at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
        ... 25 more
Error: returnStatus == 0 is not TRUE

感谢 Adv.

最佳答案

数据 <- read.json("/data/sample.json")

关于r - 如何在 SparkR 中读取 json/csv 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32352439/

相关文章:

r - 根据重复序列划分数据帧行

java - $bin/hadoop namenode --格式错误

python - Spark独立集群轮胎访问本地python.exe

r - 访问 R 中的所有函数参数

r - 中值()的奇怪行为?

postgresql - hive 流式传输不起作用

Java : Interface in a class is not accessible while importing

scala - Spark Scala : retrieve the schema and store it

r - 使用 r 中的 grid.table 打印到 pdf 文件 - 太多行无法容纳在一页上

apache-spark - 为什么在完成作业和关闭 Spark 之间会发生磁盘繁忙尖峰?