我有一个像这样的 json 文件:
{
"employeeDetails":{
"name": "xxxx",
"num":"415"
},
"work":[
{
"monthYear":"01/2007",
"workdate":"1|2|3|....|31",
"workhours":"8|8|8....|8"
},
{
"monthYear":"02/2007",
"workdate":"1|2|3|....|31",
"workhours":"8|8|8....|8"
}
]
}
我必须从这个 json 数据中获取工作日期和工作时间。
我正在使用 Spark 2.1.1
我试过这样的:
val spark = SparkSession.builder().appName("SQL-JSON").master("local[4]").getOrCreate()
val df = spark.read.json(spark.sparkContext.wholeTextFiles("sample22.json").values)
// df.show()
// df.printSchema()
//val gatewayMessageContent = df.select("employeeDetails")
//gatewayMessageContent.printSchema()
val sensorMessagesContent = df.select("work")
sensorMessagesContent.printSchema()
// I am fallowing one article online, it showing like this, but it not working for me.
val flattened = df.select( $"root", explode($"work").as("work_flat"))
我遇到这样的异常:
Error:(22, 31) value $ is not a member of StringContext
val flattened = df.select($"root", explode($"work").as("work_flat"))
^
Error:(22, 48) value $ is not a member of StringContext
val flattened = df.select($"root", explode($"work").as("work_flat"))
^
在那个例子中,他展示的是顶层的“名称”。但我的情况是我没有任何顶级元素(“工作”)。因此它不起作用。
我是 Spark 的新手。
最佳答案
你应该使用spark的withColumn函数作为
val flattened = df.withColumn("workDate", struct($"work.workdate"))
.withColumn("workHours", struct($"work.workhours"))
flattened.show(false)
你应该有以下输出
+---------------+--------------------------------------------------------------------------+--------------------------------------------+----------------------------------------+
|employeeDetails|work |workDate |workHours |
+---------------+--------------------------------------------------------------------------+--------------------------------------------+----------------------------------------+
|[xxxx,415] |[[01/2007,1|2|3|....|31,8|8|8....|8], [02/2007,1|2|3|....|31,8|8|8....|8]]|[WrappedArray(1|2|3|....|31, 1|2|3|....|31)]|[WrappedArray(8|8|8....|8, 8|8|8....|8)]|
+---------------+--------------------------------------------------------------------------+--------------------------------------------+----------------------------------------+
我假设您已经有一个架构为
的数据框root
|-- work: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- monthYear: string (nullable = true)
| | |-- workdate: string (nullable = true)
| | |-- workhours: string (nullable = true)
关于scala - Spark Dataframe - 如何访问 json 结构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44825021/