java - Zeppelin 中的 IndexOutOfBounds 错误

标签 java scala apache-spark apache-zeppelin

我在 Zeppelin 中遇到一个问题,当我尝试对我创建的临时表(数据帧)执行 SQL 操作时,我总是收到 IndexOutOfBounds 错误。

这是我的代码:

import org.apache.commons.io.IOUtils
import java.net.URL
import java.nio.charset.Charset
import org.apache.spark.sql.SparkSession
//import sqlContext._

val realdata = sc.textFile("/root/application.txt")

case class testClass(date: String, time: String, level: String, unknown1: String, unknownConsumer: String, unknownConsumer2: String, vloer: String, tegel: String, msg: String, sensor1: String, sensor2: String, sensor3: String, sensor4: String, sensor5: String, sensor6: String, sensor7: String, sensor8: String, batchsize: String, troepje1: String, troepje2: String)

val mapData = realdata
.filter(line => line.contains("data") && line.contains("INFO"))
.map(s => s.split(" ").toList)
.map(
s => testClass(s(0),
s(1).split(",")(0),
s(1).split(",")(1),
s(3),
s(4),
s(5),
s(6),
s(7),
s(8),
s(15),
s(16),
s(17),
s(18),
s(19),
s(20),
s(21),
s(22),
"",
"",
""
)
).toDF
//mapData.count()
//mapData.printSchema()
mapData.registerTempTable("temp_carefloor")

然后在下一个笔记本中我尝试一些简单的事情,例如:

%sql
select * from temp_carefloor limit 10

我收到以下错误:

java.lang.IndexOutOfBoundsException: 18
    at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
    at scala.collection.immutable.List.apply(List.scala:84)
    at $line128330188484.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$3.apply(<console>:84)
    at $line128330188484.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$3.apply(<console>:72)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:232)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:748)

现在我确信这与我的数据输出方式有关。 但我就是不明白我做错了什么,我真的很头疼。真的希望有人能帮助我。

编辑: 这是我试图提取的有用数据的摘录。

2016-03-10 07:18:58,985 INFO [http-nio-8080-exec-1] n.t.f.c.FloorUpdateController [FloorUpdateController.java:67] Floor 12FR received update from tile: 12G0, data = [false, false, false, false, true, false, false, false]
2016-03-10 07:18:58,992 INFO [http-nio-8080-exec-7] n.t.f.c.FloorUpdateController [FloorUpdateController.java:67] Floor 12FR received update from tile: 12G0, data = [false, false, false, false, false, false, false, false]
2016-03-10 07:18:59,907 INFO [http-nio-8080-exec-4] n.t.f.c.FloorUpdateController [FloorUpdateController.java:67] Floor 12FR received update from tile: 12G0, data = [false, false, false, false, false, false, false, false]
2016-03-10 07:19:10,418 INFO [http-nio-8080-exec-9] n.t.f.c.FloorUpdateController [FloorUpdateController.java:67] Floor 12FR received update from tile: 12G0, data = [true, true, false, false, false, false, false, false]

您可以在此处查看完整的平面文件:http://upload.grecom.nl/uploads/jeffrey/application.txt

最佳答案

正如我们在评论中讨论的那样,问题出在数据拆分中,您无法使用 "" 拆分数据。

一种解决方案是使用像这样的正则表达式来分割数据 "data = |tile: |[|]| |,"

您必须在正则表达式中包含所有分隔符(甚至是您不希望它们出现在提取字段中的子字符串,就像我对 "data = " 所做的那样)

希望这对您有帮助。谨致问候。

关于java - Zeppelin 中的 IndexOutOfBounds 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45190916/

相关文章:

java - Tomcat REST API 路径大小

hadoop - Spark 应用程序卡在 ACCEPTED 状态

java - scala导入没有包名的包对象

scala - scala 安装时出现错误消息

解析嵌套括号内包含的值

apache-spark - Elasticsearch主从配置

apache-spark - 如何将数据框的所有列转换为字符串

java - 如何使用 JavaScript、jQuery 和 HTML 为所有登录用户刷新页面(协作编辑)?

java - displaytag 自定义分页

java - 从 fragment 返回 Activity 时调用函数