json - 如何使用Spark Scala读取特定格式的Json文件?

标签 json scala apache-spark

我正在尝试读取一个 Json 文件,如下所示:

[ 
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"31","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} 
,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} 
,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} 
]} 
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} 
,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} 
,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} 
]} 
,...
] 

我尝试过以下命令:

    val df = sqlContext.read.json("namefile") 
    df.show() 

但这不起作用:我的列无法识别...

最佳答案

如果您想使用read.json,则每行需要一个 JSON 文档。如果您的文件包含有效的 JSON 数组和文档,它将无法按预期工作。例如,如果我们以您的示例数据输入文件应采用如下格式:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]}

如果您在上述结构上使用read.json,您将看到它已按预期进行解析:

scala> sqlContext.read.json("namefile").printSchema
root
 |-- COL: long (nullable = true)
 |-- DATA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Crate: string (nullable = true)
 |    |    |-- MLrate: string (nullable = true)
 |    |    |-- Nrout: string (nullable = true)
 |    |    |-- up: string (nullable = true)
 |-- IFAM: string (nullable = true)
 |-- KTM: long (nullable = true)

关于json - 如何使用Spark Scala读取特定格式的Json文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31702167/

相关文章:

scala - 如何将 List[List[Long]] 转换为 List[List[Int]]?

java - 探针格式化json文件并将其解析为java

json - 将 "Complex"HTML 表转换为 JSON

php - 使用 jQuery AJAX 发送 JSON

json - Spring 3.2 @ExceptionHandler @ResponseBody JSON 响应通过 Jackson

algorithm - 在 Scala 中计算最多 5 的中位数

java - 仅在 Spark 中可见的 Shapeless 中的 NoSuchMethodError

java - 将 JavaRDD 转换为 DataFrame 时出现 Spark 错误 : java. util.Arrays$ArrayList 不是 array<string> 架构的有效外部类型

python - 验证字段值的有效方法 Spark

scala - 按(Spark,Key)配对(Key,List)对