scala - 使用 “When Otherwise”时增长超过64 KB错误

标签 scala apache-spark compiler-errors apache-spark-sql

当我在Scala中运行此Spark代码时:

df.withColumn(x, when(col(x).isin(values:_*),col(x)).otherwise(lit(null).cast(StringType)))

我遇到此错误:

     java.lang.RuntimeException: Compiling "GeneratedClass": Code of method
 "apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql
 /catalyst /expressions/UnsafeRow;" of class
 "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
        at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361)
        at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)

df:Spark数据集
x:StringType列，每一行都类似于“US，Washington，Seattle”
值:Array [String]

最佳答案

这是一个与字节码增长有关的已知问题。常见的解决方案是添加检查点，即保存数据框并再次读取。
有关更多详细信息，请参见以下内容:Apache Spark Codegen Stage grows beyond 64 KB

关于scala - 使用 “When Otherwise”时增长超过64 KB错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63093710/

上一篇：c - 我在C程序中遇到冲突类型错误

下一篇：java - 如果哈希码等于实现不正确，我的对象创建会失败吗？

java - 使用 Play Framework 时是否可以摆脱 Scala？

debugging - Arduino : The system cannot find the file specified error

json - 存储案例对象时使用哪个 JSON 库？

macos - 在 Spark 和 Hadoop 之间共享数据(Mahout)

scala - 在 Scala/Spark 中获取 RDD 的类型

elasticsearch - 为并行查询优化 Elasticsearch

haskell - 使用 let 在线解析错误(可能是不正确的缩进或不匹配的括号)

Java zip.close() 挂起

正则表达式:Scala 中的匹配和标记化